Google is introducing its next advancement with the launch of Gemini 3.5 Live Translate, its latest audio model designed for real-time speech-to-speech translation.


The model can automatically detect more than 70 languages and produce smooth, natural-sounding translated speech while preserving the speaker’s intonation, rhythm, and pitch, Caliber.Az reports, citing the company’s blog.


Unlike turn-based systems that wait for a speaker to finish before responding, Gemini 3.5 Live Translate generates speech continuously, balancing the need for contextual accuracy with real-time responsiveness.


This allows for fluid audio output with minimal delays, typically remaining only a few seconds behind the speaker.


By processing speech as it is streamed, the system enables more seamless cross-language communication.


It supports multilingual input without requiring manual configuration, while its noise resistance allows it to function effectively in loud and unpredictable environments.


It can be used for live interpretation in multilingual calls, meetings, classrooms, broadcasts, and other settings.


Using the Gemini Live API, developer platforms such as Agora, Fishjam, LiveKit, Pipecat, and Vision Agents can integrate real-time voice translation into their applications more easily, handling complex media streaming infrastructure so developers can focus on user experience.


Google’s partners at Grab are currently testing the technology to support near real-time multilingual communication between drivers and passengers during pickups, where users collectively make over 10 million voice calls per month.


By Bakhtiyar Abbasov