Revolution in voice production from OpenAI: Voice Engine platform

Subscribe to Teknoblog content on Google News:

OpenAI, which constantly comes up with innovations in the field of artificial intelligence (AI), has now announced its voice production platform called Voice Engine. This platform can create artificial intelligence-generated voices similar to that person's voice, using only a 15-second voice recording of a user. These generated voices can read text in various languages, including the user's own language. OpenAI is expanding its experience with small-scale deployments to put this technology to good use in many different sectors, from education to healthcare.

This platform, which has limited access for now, is currently used by education technology companies such as Age of Learning, visual storytelling platforms such as HeyGen, pre-health software manufacturers such as Dimagi, AI communication application developers such as Livox, and healthcare systems such as Lifespan. OpenAI shows potential uses of the technology by sharing examples from experiments these companies have conducted using the platform.

The role of artificial intelligence in voice production

OpenAI states that it started development of its Voice Engine technology in late 2022, and that this technology supports ChatGPT's Reading Feature as well as preset voices for the text-to-speech API. The technology is trained on a mix of both licensed and publicly available data. Currently, this model is only available to about 10 developers.

AI-based text-to-speech continues to evolve in the field of generative AI, often focusing on instrumental or natural sounds. However, there are fewer studies focusing on voice production. OpenAI notes that the US government has also taken steps to prevent unethical uses in this area, for example, the US Federal Communications Commission has banned spam robocalls using President Joe Biden's AI-cloned voice.

OpenAI's partners have agreed to abide by usage policies such as not impersonating people or organizations without their consent, obtaining the “explicit and informed consent” of the original speaker, not developing methods by which individual users can create their own voices, and disclosing to listeners that the voices are produced by the AI. On the other hand, OpenAI also adds digital watermarks to audio recordings to be able to trace the origin of audio recordings and actively monitor audio usage.

OpenAI suggests some steps that could limit the risks around such tools. These include phasing out voice-based authentication for access to bank accounts, policies to protect the use of people's voices in AI, more education about AI-based fakes, and the development of tracking systems to monitor AI content.