OpenAI announced Thursday the launch of three new audio models in its API designed to help developers build more sophisticated voice applications, according to the company’s official announcement.
The models include GPT-Realtime-2, described by OpenAI as “our first voice model with GPT-5-class reasoning that can handle harder requests and carry the conversation forward naturally.” According to TechCrunch, this represents an advancement over its predecessor, GPT-Realtime-1.5, with enhanced reasoning capabilities to deal with more complicated user requests.
OpenAI also introduced GPT-Realtime-Translate, a live translation model that translates speech from more than 70 input languages into 13 output languages while keeping pace with the speaker, according to openai.com. TechCrunch noted the feature is designed to provide real-time translation services that “keep pace” with users conversationally.
The third model, GPT-Realtime-Whisper, provides streaming speech-to-text transcription that captures speech live as the speaker talks, according to both sources.
“Together, the models we are launching move real-time audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” OpenAI stated.
According to TechCrunch, OpenAI has built guardrails into the system to prevent misuse, with triggers that can halt conversations detected as violating harmful content guidelines. The company suggested potential applications include customer service, education, media, events, and creator platforms.