What is Cross-Language Voice Conversion and Why It’s Important
The voice conversion industry, fueled by advancements in AI voice generator technologies, is expanding rapidly, already surpassing $1 billion. The range of applications for this type of technology includes voice assistants, talking robots, radio and television programs, dubbing books and films with the voices of famous people, restoring the voices of those who have died or lost the opportunity to speak, the list goes on. Over the past few years, the development of voice cloning technology underwent a noticeable acceleration, especially during the pandemic.
Voice cloning and conversion allow companies to save significant production costs, time, and improve communication with their clients. However, it is still a relatively young technology and many organizations are still discovering all the possibilities that the AI voice technology has to offer. In this article, we will dive into one of the most popular applications of voice cloning, known as cross-language voice conversion.
What is Voice Conversion All About?
Voice conversion is a technology that helps you to convert someone’s voice into the voice of a different person. It is also used to convert the words of a person’s voice into the words of another language.
Beyoncé speaks fluent Mandarin using her own voice - that is exactly how voice conversion works. Technically speaking, the voice AI technology allows anyone to sound like almost anyone else in any language (assuming the consent of the individual whose voice is being used).
Such technologies have existed for a long time, but until the early 2010s, converted voices sounded mechanical. With the advancement of technology and AI, it is now possible to literally decompose the human voice “into atoms”, capture all its characteristics and nuances, and create a voice that does not belong to a person, but sounds absolutely human, as well as synthesize the voices of specific people.
Specialists in cloning human voices explain that teaching a computer to speak like a person is not easy at all: the human voice has many different characteristics. “To analyze the human voice, you need to know a lot about acoustics, the principles of speech sound, you need to understand the physiological aspects,” explains Klaus Scherer, professor emeritus of emotion psychology at the University of Geneva. “So this process always necessarily involves different disciplines, and it requires a lot of planning that it is necessary to master in order to achieve something worthwhile.”
How does Cross-Language Voice Conversion Work
When cloning the voice of a particular person, specialists take samples of their speech.
In total, about an hour should be recorded, and 10–15 minutes of recording will be used for the cloning process.
These recordings are loaded into a neural network, which then generates a voice, taking into account all possible nuances. The result is a voice that is almost indistinguishable from the original.
If the voice of someone who has long since passed is being cloned, the procedure will be the same. For example, the speech to speech voice conversion of the famous American chef, writer, and TV presenter Anthony Bourdain, who died in 2018, was used for the documentary film Roadrunner.
To recreate Bourdain's voice, director Morgan Neville collected tens of thousands of hours of video and audio recordings. Based on this amount of data, the chef's voice was recreated, which is used in several parts of the film.
We often encounter challenges when working with old recordings because some of them suffer from quality issues of all kinds. But thanks to constant improvements to our AI algorithms, we deal with all the challenges and deliver the highest resolution audio across the board. You can find out more by downloading this whitepaper about Respeecher's audio super-resolution algorithm.
Famous examples of using Respeecher’s voice conversion technology include:
- Synthesizing a younger Luke Skywalker's voice for Disney+'s The Mandalorian
- Aloe Blacc paying tribute to Avicii singing ‘Wake Me Up’ in five languages
- Re-creating Manuel Rivera Morales’ voice for the Olympic Games
- Creating a digital Vince Lombardi for Super Bowl LV
And many more.
For what Purposes can Voice Conversion Technology be Used?
There are a variety of use cases that voice conversion simplifies in different industries. These include:
- Adapting the voices of actors for the localization of films
- Voice acting for game characters
- Voice greetings
- Audiobook recitation, including the cloning of parents' voices for fairy tales, read by professional commentators
- Creation of audio and video courses
- Promotional videos and audio ads
- Voices of bots and smart devices, personalized voice assistants
- Synthesis of natural-sounding oral speech for people who have lost the ability to speak (using recordings of their earlier speech)
- Adapting oral speech into a local accent model
Below, we’ll focus on some use cases to demonstrate more in-depth deliverables for voice conversion technology.
Entertainment and advertising
Re-dubbing an English-speaking star into Japanese or Ukrainian will never feel or appear as authentic on-screen as the original language does. Moreover, the process takes many hours and requires actors to be present in studios for extended sessions.
AI voice generation can solve this problem. As with Beyonce speaking Mandarin, you can make your actors speak any language you need. All you need is a recording of the actor’s speech and a volunteer to read the necessary text in the other language.
Dubbing and localization
Localization and dubbing agencies put an incredibly high workload on the shoulders of their dubbing actors. A typical practice is to use the voices of ten to twenty actors in many films, video games, and advertisements every year. Of course, this leads to overload.
Voice conversion frees agencies from working with the same overloaded actors from project to project. Dubbed content can be captured in the original actor’s voice and dubbed by anyone else.
Social purposes
With Respeecher’s recent advancements in voice cloning technology that millions have already witnessed in action in Hollywood films, we can, for example, give a native English speaker the ability to speak any other language.
This type of possibility opens doors for various social campaigns in which individuals can clearly present a specific idea to the whole world, in the native language of any specific country.
Considering the current war in Ukraine, Respeecher announced its initiative to give celebrities a chance to use their voice in the language of the Ukrainian people. We encourage celebrities to use their voice to support Ukraine and its people. Respeecher's voice cloning technology will help them speak fluent Ukrainian to encourage and warm the hearts of the nation.
Please reach out to us at WithUkraine@respeecher.com if you want to join the initiative.
In a Nutshell
Nelson Mandela once said: “If you talk to a man in a language he understands, that goes to his head. If you talk to him in his own language, that goes to his heart.”
A modern company’s marketing should be highly personalized to attract more customers and outperform competitors. Speaking the native language of the demographic you are targeting is critical for reaching customers on a deeper level and keeping them with you for a long time.
Voice conversion technology now allows you to reach out to a global audience with your message and provide support to different groups of people all around the world.
We encourage you to get in touch with Respeecher for a brief consultation regarding the use of our AI voice generator and scaling cross-language conversion in the best way possible.