Technology
Danish Kapoor
Danish Kapoor

Google Gemini 3.5 will instantly translate conversations with Live Translation

Google announced Gemini 3.5 Live Translation, a new artificial intelligence model that focuses on real-time speech translation. The model, which is the company’s most up-to-date solution in the field of voice translation, can instantly translate between more than 70 languages ​​by preserving the speaker’s characteristics such as intonation, tempo and volume. According to Google’s statement, the system is being gradually rolled out to different products for developers, corporate customers and end users, starting today.

Google’s translation technologies began nearly two decades ago as one of its first experiments in machine learning. In the meantime, more than one trillion words have been translated for billions of users every month through Google Translate and other services. With Gemini 3.5 Live Translation, the company aims to take this experience forward not only at the text level, but also in the field of natural and uninterrupted voice communication.

In traditional speech translation systems, users must complete the speaker’s sentence in order to receive a translation. Gemini 3.5 Live Translation, on the other hand, starts producing the translation while the conversation continues, providing a more fluid communication. The model balances waiting for context and staying synchronous with the speaker to improve translation accuracy. So it can follow the speaker with a delay of just a few seconds, without long silences or unnatural pauses.

Gemini 3.5 Live Translation is coming to three different Google services

In the first phase, Google makes the new model accessible on different platforms for developers, businesses and individual users. Developers will be able to access the system in public preview via Gemini Live API and Google AI Studio. On the corporate customer side, Google Meet integration is entering a special preview process for selected Google Workspace customers this month. End users will be able to use the new experience through the Google Translate application on Android and iOS operating systems.

The model produces real-time translation by processing conversations during transmission. In addition, since it can automatically detect different languages, users do not need to manually configure the language settings. The ability to work in noisy environments is also among the prominent features of the developed system. In this way, it is aimed to obtain more consistent results in different usage scenarios such as multilingual meetings, online courses, broadcasts or phone calls.

Google states that companies developing voice translation applications can also benefit from Gemini Live API support. Platforms such as Agora, Fishjam, LiveKit, Pipecat and Vision Agents help create voice translation solutions faster by managing real-time media streaming infrastructure on behalf of developers. This way, developers can focus on user experience rather than infrastructure details.

Grab, one of the company’s business partners, is also among the organizations testing the new model. Grab aims to make communication between drivers and passengers almost real-time across different languages. Considering that more than 10 million voice calls are made through the platform every month, this type of integration offers a remarkable example in terms of daily use.

The update to be made on Google Meet also attracts attention. The speech translation system, which previously supported only five languages, will now support over 70 languages. In addition, instead of only English-centered translation, direct translation will be possible between more than 2,000 language combinations. Google also states that it is working on a new interface that makes it easier to access speech translation. These innovations, which will initially be offered as part of a special preview, are planned to reach a wider user base within the year.

In the Google Translate application, users will be able to benefit from the Live Translation feature by connecting any headset. The system offers a more natural listening experience by preserving the speaker’s tone of voice and speaking style as much as possible. The new “listening mode” prepared for Android users allows translations to be listened to directly from the phone’s handset. Thus, in situations where there is no headset, users can listen to the translation by holding the phone to their ear as in a normal phone call.

Google also emphasizes that all sounds produced by artificial intelligence are marked with SynthID technology. This digital marking method, which is impossible to detect by the human ear, helps limit the spread of false information and fake content by enabling the detection of artificial intelligence-generated content. While the prominent aspects of the new model include wide language support, low latency and natural speech flow, how the actual usage performance will be shaped under different languages ​​and scenarios will become clearer in the coming period.

Danish Kapoor