Technology
Danish Kapoor
Danish Kapoor

OpenAI raises the bar on API for voice AI

OpenAI for Realtime API GPT-Realtime-2, GPT-Realtime-Translate And GPT-Realtime-Whisper announced its models. With these three models, the company gives developers the opportunity to build applications that talk, perform live translation, and instantly transcribe voice. A wide range of uses emerge, from customer services to education, from live events to content creator platforms.

The company says that it has prepared the new models to make the real-time audio experience more functional. GPT-Realtime-2It offers more realistic and fluent speaking capabilities for voice chat applications. OpenAI, in this model GPT-5 class reasoning gives space to his talents. Thus, the model can handle more complex requests that go beyond simple commands within voice speech.

GPT-Realtime-2, OpenAI’s previous audio model GPT-Realtime-1.5 It carries the line established on it further. According to the evaluations shared by the company, the new model Big Bench Audio compared to the previous model in the test 15.2 percent received higher vocal intelligence scores. Audio MultiChallenge On the other hand, it covers topics such as instruction follow-up and multi-turn conversation management. 13.8 percent We are seeing an increase. This data shows that voice agents will not only respond but will better follow context throughout the conversation.

OpenAI’s second innovation GPT-Realtime-Translateputs live speech translation into the hands of developers. Model More than 70 input languages can understand and speak 13 output languages can transfer. The company highlights this model for call centers, live classes, conferences, video calls and broadcast environments. The model automatically detects the speaker’s language, the developer selects the target language, and the system produces the text transcript along with the translated audio.

OpenAI offers developers voice chat, translation and live transcription

third model GPT-Realtime-Whisperfocuses on the live speech-to-text side. The model segments the speech while the audio stream is still ongoing and delivers text output while keeping the delay low. This feature opens up a practical area of ​​use for meeting notes, live subtitles, in-class lectures and broadcasts. By changing the delay setting, developers can choose between earlier intertexts or higher accuracy.

OpenAI uses these three models Realtime API and separates pricing according to usage type. GPT-Realtime-2 charged in tokens; voice input $32 per 1 million tokenscached audio input $0.40 per 1 million tokensIf the audio output is $64 per 1 million tokens It is located at the level. GPT-Realtime-Translate per minute $0.034, GPT-Realtime-Whisper if per minute $0.017 It works at cost. This distinction makes the cost calculation more understandable in speaking agent, live translation and transcription jobs.

The company states that new voice models not only increase the speech and translation side, but also bring additional measures on the security side. OpenAI states that it uses some triggers against spam, fraud and online abuse risks. The system can stop the session when it detects conversations that violate harmful content guidelines. Developers can also add their own control layers via the Agents SDK.

These models provide the opportunity to prepare faster multilingual support scenarios, especially for customer service teams. Education platforms can offer subtitling and translation in live lectures, and media organizations can add instant text extraction to their broadcast streams. Event companies can translate speeches into different languages, and content creator platforms can prepare more accessible experiences in live broadcasts. OpenAI’s Realtime API move enables teams developing voice applications to establish more workflows under a single API.

Don’t Miss the News!
Make Teknoblog your preferred source on Google Search and see us more often in Top News.

📡 Follow Teknoblog
In order not to miss the technology agenda, 📰 add it to Google News, 💬 join our WhatsApp channel, ▶ subscribe to YouTube, 📷 follow us on Instagram and 𝕏 X.

Danish Kapoor