San Francisco: A team of researchers from Google has introduced AudioPaLM, a Large Language Model (LLM) that can tackle speech understanding and generation tasks with greater accuracy.
AudioPaLM combines the advantages of two existing models - PaLM-2 and AudioLM.
AudioPaLM represents a multimodal architecture that effectively brings together the strengths of two established models: PaLM-2 and AudioLM.
PaLM-2 excels in comprehending text based linguistic knowledge, making it a robust text oriented language model whereas AudioLM demonstrates exceptional proficiency in retaining paralinguistic details such as speaker identity and tone.
Through combination of these two models, AudioPaLM harnesses the linguistic expertise of PaLM-2 and the paralinguistic information preservation capabilities of AudioLM.
This results in a comprehensive understanding and generation of both text and speech.
To facilitate this integration, AudioPaLM employs a shared vocabulary that effectively represents both speech and text using a finite set of discrete tokens. This unification enables various tasks, including speech recognition, text-to-speech synthesis, and speech-to-speech translation, to be seamlessly integrated within a single architecture and training process.
Upon evaluation, AudioPaLM outperformed existing systems in speech translation by a significant margin. It demonstrated the ability to perform zero-shot speech-to-text translation for language combinations which means it can accurately translate speech into text for languages it has never encountered before, opening up possibilities for broader language support.
It is unclear when this technology will be implemented into final products, but we can see Google Translate and other apps getting major upgrades through this development.
For all the latest News, Opinions and Views, download ummid.com App.
Select Language To Read in Urdu, Hindi, Marathi or Arabic.