In this talk, Abraham Sanders will explore how Audio Language Models work and how to use them, as well as cover tokenization of audio waveforms; creation of autoregressive language models; and applying audio language models to common tasks such as Automatic Speech Recognition (ASR), Text-To-Speech (TTS) and Speech-To-Speech Machine Translation. He will conclude with a discussion of future-focused applications, including text-guided music generation and full-duplex spoken dialogue agents.
Remote URL
https://tw.rpi.edu/media/foci-genaillm-users-group-llms-audio-applications-01-may-2024
Audience