Google has unveiled a new language artificial intelligence model AudioPaLM that can listen, speak and translate with high accuracy and great speed.
As the name indicates, this new model builds on the strengths of two other models that already exist, PaLM-2 and AudioLM.
PaLM-2 belongs to LLMs that can understand, simulate, and generate text in a human-like way, while the AudioLM model is concerned with things like preserving the speaker’s identity and tone of voice (phonetics).
Google researchers introduced the AudioPaLM model by combining both of the aforementioned models in order to generate text and speech with high efficiency.
The new AudioPaLM model can perform many different functions such as translating audio from one language to another while maintaining the same tone of voice, capturing voices or spoken commands, and then reproducing audio in different other languages.
The model also recognizes sounds and translates them into text, so that it can translate linguistic structures that have never been exposed to it before with great accuracy, according to the researchers working on the model.
Google’s model is still in research and development, and it is not yet known when it will be available to the public.
On the other hand, Meta previously announced an artificial intelligence model called Voicebox, a modern model that can perform speech creation tasks, such as: editing audio clips efficiently, taking audio samples to generate speech in different languages, and more.