by Margarita Grubina – May 9, 2022 10:17:37 AM • 8 min

What is Cross-Language Voice Conversion and Why It’s Important

•••

The voice conversion industry, fueled by advancements in AI voice generator technologies, is expanding rapidly, already surpassing $1 billion. The range of applications for this type of technology includes voice assistants, talking robots, radio and television programs, dubbing books and films with the voices of famous people, restoring the voices of those who have died or lost the opportunity to speak, the list goes on. Over the past few years, the development of voice cloning technology underwent a noticeable acceleration, especially during the pandemic.

Voice cloning and conversion allow companies to save significant production costs, time, and improve communication with their clients. However, it is still a relatively young technology and many organizations are still discovering all the possibilities that the AI voice technology has to offer. In this article, we will dive into one of the most popular applications of voice cloning, known as cross-language voice conversion.

What is Voice Conversion All About?

Voice conversion is a technology that helps you to convert someone’s voice into the voice of a different person. It is also used to convert the words of a person’s voice into the words of another language.

Beyoncé speaks fluent Mandarin using her own voice - that is exactly how voice conversion works. Technically speaking, the voice AI technology allows anyone to sound like almost anyone else in any language (assuming the consent of the individual whose voice is being used).

Such technologies have existed for a long time, but until the early 2010s, converted voices sounded mechanical. With the advancement of technology and AI, it is now possible to literally decompose the human voice “into atoms”, capture all its characteristics and nuances, and create a voice that does not belong to a person, but sounds absolutely human, as well as synthesize the voices of specific people.

Specialists in cloning human voices explain that teaching a computer to speak like a person is not easy at all: the human voice has many different characteristics. “To analyze the human voice, you need to know a lot about acoustics, the principles of speech sound, you need to understand the physiological aspects,” explains Klaus Scherer, professor emeritus of emotion psychology at the University of Geneva. “So this process always necessarily involves different disciplines, and it requires a lot of planning that it is necessary to master in order to achieve something worthwhile.”

How does Cross-Language Voice Conversion Work

When cloning the voice of a particular person, specialists take samples of their speech.

In total, about an hour should be recorded, and 10–15 minutes of recording will be used for the cloning process.

These recordings are loaded into a neural network, which then generates a voice, taking into account all possible nuances. The result is a voice that is almost indistinguishable from the original.

If the voice of someone who has long since passed is being cloned, the procedure will be the same. For example, the speech to speech voice conversion of the famous American chef, writer, and TV presenter Anthony Bourdain, who died in 2018, was used for the documentary film Roadrunner.

To recreate Bourdain's voice, director Morgan Neville collected tens of thousands of hours of video and audio recordings. Based on this amount of data, the chef's voice was recreated, which is used in several parts of the film.

We often encounter challenges when working with old recordings because some of them suffer from quality issues of all kinds. But thanks to constant improvements to our AI algorithms, we deal with all the challenges and deliver the highest resolution audio across the board. You can find out more by downloading this whitepaper about Respeecher's audio super-resolution algorithm.

Famous examples of using Respeecher’s voice conversion technology include:

Synthesizing a younger Luke Skywalker's voice for Disney+'s The Mandalorian
Aloe Blacc paying tribute to Avicii singing ‘Wake Me Up’ in five languages
Re-creating Manuel Rivera Morales’ voice for the Olympic Games
Creating a digital Vince Lombardi for Super Bowl LV

And many more.

For what Purposes can Voice Conversion Technology be Used?

There are a variety of use cases that voice conversion simplifies in different industries. These include:

Adapting the voices of actors for the localization of films
Voice acting for game characters
Voice greetings
Audiobook recitation, including the cloning of parents' voices for fairy tales, read by professional commentators
Creation of audio and video courses
Promotional videos and audio ads
Voices of bots and smart devices, personalized voice assistants
Synthesis of natural-sounding oral speech for people who have lost the ability to speak (using recordings of their earlier speech)
Adapting oral speech into a local accent model

Below, we’ll focus on some use cases to demonstrate more in-depth deliverables for voice conversion technology.

Entertainment and advertising

Re-dubbing an English-speaking star into Japanese or Ukrainian will never feel or appear as authentic on-screen as the original language does. Moreover, the process takes many hours and requires actors to be present in studios for extended sessions.

AI voice generation can solve this problem. As with Beyonce speaking Mandarin, you can make your actors speak any language you need. All you need is a recording of the actor’s speech and a volunteer to read the necessary text in the other language.

Dubbing and localization

Localization and dubbing agencies put an incredibly high workload on the shoulders of their dubbing actors. A typical practice is to use the voices of ten to twenty actors in many films, video games, and advertisements every year. Of course, this leads to overload.

Voice conversion frees agencies from working with the same overloaded actors from project to project. Dubbed content can be captured in the original actor’s voice and dubbed by anyone else.

Social purposes

With Respeecher’s recent advancements in voice cloning technology that millions have already witnessed in action in Hollywood films, we can, for example, give a native English speaker the ability to speak any other language.

This type of possibility opens doors for various social campaigns in which individuals can clearly present a specific idea to the whole world, in the native language of any specific country.

Considering the current war in Ukraine, Respeecher announced its initiative to give celebrities a chance to use their voice in the language of the Ukrainian people. We encourage celebrities to use their voice to support Ukraine and its people. Respeecher's voice cloning technology will help them speak fluent Ukrainian to encourage and warm the hearts of the nation.

Please reach out to us at WithUkraine@respeecher.com if you want to join the initiative.

In a Nutshell

Nelson Mandela once said: “If you talk to a man in a language he understands, that goes to his head. If you talk to him in his own language, that goes to his heart.”

A modern company’s marketing should be highly personalized to attract more customers and outperform competitors. Speaking the native language of the demographic you are targeting is critical for reaching customers on a deeper level and keeping them with you for a long time.

Voice conversion technology now allows you to reach out to a global audience with your message and provide support to different groups of people all around the world.

We encourage you to get in touch with Respeecher for a brief consultation regarding the use of our AI voice generator and scaling cross-language conversion in the best way possible.

FAQ

Cross-language voice conversion is a process where AI voice technology is used to transform someone's voice into another language while preserving their unique voice characteristics. Using voice cloning technology, this allows people to speak in multiple languages with their own voice, as seen in celebrity messages supporting Ukraine.

Voice cloning technology uses AI voice generators to analyze speech samples and recreate an individual’s voice. The process involves capturing the nuances of a person's voice and generating speech that sounds like them, even in different languages, utilizing cross-language voice conversion and voice synthesis advancements.

The voice conversion industry benefits entertainment, dubbing, advertising, gaming, and social causes. It’s used for AI-powered multilingual voices, personalized voice assistants, and localization with AI to make content more accessible, including celebrity messages in Ukrainian for global solidarity.

AI voice technology streamlines dubbing and localization by converting actors' voices into different languages, maintaining their unique tone and style. Voice cloning and cross-language voice conversion enable faster, more authentic translations, reducing the need for extensive studio sessions and offering cost-effective voice dubbing solutions.

Yes, AI-generated voices can power social campaigns by allowing individuals or celebrities to speak in any language, fostering connection and empathy. For example, Respeecher’s voice cloning technology enables celebrity messages in Ukrainian, supporting global solidarity with Ukraine during the ongoing war.

Glossary

AI voice generator

A tool that uses voice cloning technology and AI voice technology to create lifelike, AI-powered multilingual voices, enabling cross-language voice conversion and localization with AI.

Voice cloning technology

A method using AI voice generators to replicate a person’s voice, enabling AI-powered multilingual voices, cross-language voice conversion, and voice dubbing solutions.

Cross-language voice conversion

A process using AI voice technology and voice cloning technology to transform a person's voice into another language, enabling AI-powered multilingual voices and localization with AI.

Speech-to-speech synthesis

A process that uses AI voice technology and voice cloning technology to convert spoken language into another, enabling AI-powered multilingual voices and localization with AI.

AI voice localization

A technique using AI voice technology and voice cloning technology to adapt voices to different languages and regions, enabling AI-powered multilingual voices and voice dubbing solutions.

Margarita Grubina

Business Development Executive

Margarita drives Respeecher's growth through strategic market analysis and nurturing client relations. Her role is pivotal in discovering and tapping into new market opportunities, as well as maintaining strong connections with clients. She combines her industry expertise with a forward-thinking approach, ensuring Respeecher's offerings resonate with evolving market needs in the dynamic field of voice AI technology.