OpenAI Unveils AI Voice Cloning Tool
OpenAI has unveiled a preview of Voice Engine, a model that can clone human voices from a 15-second audio sample and generate natural-sounding speech.
The most important aspect is the input; if you have an emotional voice in the input, the output would sound amazing. I'd like AI that enhances voice input to make it more emotional.
In some cases, where the generated audio sounded low-quality, the original didn't sound like a studio recording, either. I guess it was doing as good as it could with what it had to work with.
Amazed by the fact that you can "give someone back their voice" using such a small amount of audio content, and the way people are always recording themselves these days, we probably all have at least 18 seconds of audio... if not, put some aside as an archive for the future, just in case.
The details:
- The model is able to preserve the accent and emotions of the original speaker in generated speech.
- Voice Engine is currently being tested by a small group of trusted partners, including AI startup HeyGen.
- OpenAI has implemented safety measures like watermarking and proactive monitoring to prevent misuse.
- The company revealed it first developed the tech in late 2022 and has been using it to power voices in its text-to-speech API and Chat GPT.
Why it matters: OpenAI is clearly far ahead in the space, with Voice Engine being deployed internally since 2022. However, with no public release in sight, the company seems to understand the risks, such as deepfake scams during an election year.
OpenAI Unveils AI Voice Cloning Tool
OpenAI isn't alone in this race. Other companies are developing similar technology, but OpenAI seems to be ahead of the curve. They've been working on Voice Engine since late 2022 and have already used it to power the voices in their text-to-speech API and Chat GPT. Their early lead could significantly impact various industries.
However, with great power comes great responsibility. OpenAI understands the potential dangers of Voice Engine. Malicious actors could use it to create deepfakes, fabricate news stories, or impersonate people for financial gain. Imagine a political candidate's voice being used to spread misinformation during an election. Scary, right?
As far as the audio of the voice engine sounds low quality if you listen to the audio they're feeding it that's why. That audio sounds like some teacher recording in a room on a crappy laptop mic.
That's actually the impressive part. Not only is it very emotionally and phonetically accurate to how the guy in the source recording sounded but it's also mimicking the sort of edited sound of the audio and the conditions of the recording. As an audio engineer I find this insane.
Safety Concerns
Bad actors will always find ways to misuse technology, and there are already decent voice cloning models publicly available like Eleven Labs. However, OpenAI should still release this Voice Engine model in a carefully controlled manner after implementing robust safeguards.
The immense potential for restoring voices to patients and enabling accessibility makes responsible release extremely valuable, despite risks that must be mitigated.
OpenAI is taking steps to mitigate these risks. They're working with a limited group of trusted partners to test Voice Engine in controlled environments. Additionally, they've implemented safety measures like watermarking the generated speech and proactive monitoring to detect misuse.
Voice Engine represents a significant leap forward in artificial intelligence. While the potential benefits are vast, the potential dangers are real. It's up to OpenAI and other developers to ensure this technology is used for good and doesn't become a tool for manipulation. As the debate continues, one thing is certain: the world of voice technology is about to get a lot more interesting, and a lot more complex.