Voice Cloning and Video Games: How Game Developers Can Create Synthetic Voices

Written by Vova Ovsiienko | Mar 24, 2021 2:44:14 PM

While video game developers create stories and plots for virtual characters, until recently, the dubbing of game character voices wasn’t so different from that of characters in movies. The process for a video game involves dubbing the voices of actors and actresses along with a substantial time and money investment in the production. However, with the advent of AI voice cloning technology, everything is changing.

What are synthetic voices?

A synthetic voice is a human voice that has been produced by a computer, often using generative AI technologies. The synthesized voice is indistinguishable from the real one. This means that deciphering between the actual person's voice and the machine’s is impossible for an outsider listener.

How does voice cloning work?

The most famous among the lay audience is the so-called text-to-speech (TTS) synthesis. For this process, the computer reads the text, and this speech is recorded.

The most common example is when you ask Google to read the text you type in services like Google Translate.

This type of speech is easily distinguishable from a natural human’s voice. In a recent post, we examined the technical aspects of a characteristic robotic voice.

Things get more interesting in speech-to-speech (STS) voice conversion. Imagine you need to dub a video game character, only the original voice actor is no longer available.

It's easy to imagine with gaming franchises that have been around for decades. Unfortunately, video game or cartoon characters can easily outlive their human counterparts.

So how do you capture a character's original voice when you don’t have access to the original actor?

Artificial intelligence and machine learning technologies present a solution to these problems. Let's briefly describe the speech-to-speech character voice generation process, as developed by Respeecher:

The first requirement is good quality audio recordings of the voice being cloned. The recordings should be at least one hour in length. Artificial intelligence cannot create a perfect voice model with anything less.
We then feed this data into the machine learning algorithms. The system performs complex calculations to form a model of the original voice. Here it is crucial that the original voice recording includes as many different emotions, elocutions, tones, cadences, vocal timbre, etc., as possible. The more emotionally-loaded the original speech is, the more accurate the voice model will be.
When the model is formed, the ability to generate an unlimited amount of audio content that is indistinguishable from the original speaker is now possible. All that remains is for someone to record the speech that is needed.
The final stage is the process of transforming that person's recorded voice into the voice of the original actor. This process of conversion involves completely morphing every characteristic of speech into the authentic actor’s voice.

With this type of AI voice technology now available, video game producers are no longer restricted to dubbing actors. They can even generate unique voices that did not exist before.

Here's how AI voice generation changes game production for the better

Some of Respeecher’s most innovative benefits that allow game developers to save time and money on dubbing and additional dialogue replacement (ADR) include:

Allowing for some of the most famous voice actors to participate in the project. It's primarily about the money and time saved on bringing in a celebrity. Suppose you want to incorporate an a-list star in your project. In that case, they will probably appreciate that they only need to provide a single, hour-long voice recording. Based on this, Respeecher can generate an unlimited amount of original speech content.
Resurrecting voices from the past. Imagine that in your WWII strategy game, the dialog of the main characters and villains is in their original voices. You can also easily replace a voice actor that left your project somewhere between the first and second games.
Making it easy to solve the problem of dubbing child actors. When children grow up, their voices change. This can become a problem if your project evolves over time with the same child as the hero. Luckily, you can keep the original character's voice without depending on the original actor that dubbed them. Learn more about all the benefits of cloning a child’s voice here.
AI voice synthesis makes adding adjustments to game content easier. If the writer or director makes edits to a scene, you no longer need to work with the voice actor to add their changes. Instead, any modifications to the voice content are easily implemented by the sound engineer.

View full post