Debunking the 4 Most Common Voice Synthesis Myths

Written by Vova Ovsiienko | Jun 2, 2021 11:59:56 AM

In this article, we debunk some of the most prevalent voice synthesis myths to determine their validity. We'll also delve deeper into voice synthesis technology and its beneficial impacts.

Before we get started, let's recall what voice cloning (or synthesis) is. At Respeecher we use artificial intelligence (AI) to synthesize speech. You might be familiar with services like Google that can generate speech from the text you type. Respeecher is different. Our software does speech-to-speech voice conversion: instead of replacing a human being, it allows a person to speak in a different voice.

In short, it works like this. The voice cloning system analyzes the original target's voice. Any other person can then produce the speech needed.

For a better understanding of the differences between speech-to-speech and text-to-speech, you can consult our voice synthesis FAQ.

Respeecher then synthesizes the dialogue, combining the voice of the target person and the speech spoken by someone else. As a result, we get full-fledged speech in the target voice, except that the target person themselves did not say a single word of it. With their consent, of course.

All the intonations, emotions, and specific characteristics are conveyed in the AI voice with the same precision that the target person themselves would have conveyed them with.

Even when the client lacks high-res sources, Repeecher can make it work. Despite this challenge, we have built an audio version of the super resolution algorithm to deliver the highest resolution audio across the board. You can download this whitepaper about Respeecher's audio super-resolution algorithm to find out more.

Does that involve deepfake technology? I heard it is often used with malicious intent

Firstly, we encourage you to read or listen to the Code[ish] Podcast: The Ethical and Technical Side of Deep Fakes. There we explained in detail how the technology works for both video and audio deepfakes.

It makes no sense to deny that cybercriminals can use AI voice technology to commit crimes and create negative news headlines. As with any technology, the problem is not the approach itself but how it is used by specific people.

To prove our point, here are some examples of how this same technology makes people's lives better:

Synthesized speech helps people with various disabilities speak in their own voice, which they otherwise wouldn't be able to do.
Video and audio deepfakes are widely used in the movie and game industries. The technology helps with dubbing in foreign languages as well as easing the post-production process.
Deepfakes can be used for multiple use cases in museums and universities. It helps re-create authentic historical figures for educational purposes.

As a company actively working with technologies close to deepfake technologies, we take the possible moral and political implications seriously. Respeecher has developed a strict AI ethics code and has implemented tools such as an audio watermark to identify content synthesized using our technology.

The cloned voice still differs from the original, and not for the better

On the Internet, you may come across opinions stating that a voice synthesized using AI and machine learning can never be 100% similar to the original. This is perhaps one of the most easily debunked myths in voice synthesis.

Look at how our Chief Research Officer Grant Reaber speaks in Danielle Cohn's voice. Pretty neat right?

Speech-to-speech conversion software like Respeecher preserve the natural prosody of a person’s voice because the system excels at duplicating the source speaker's prosody.

The algorithm comes equipped with an infinite prosodic palette for content creators, so the sound of the synthesized AI voice is indistinguishable from the original.

Moreover, there's no issue with syncing lips or other inconsistencies that traditional dubbing introduces because the voice produced is a cloned version.

Just watch this quick demo showcasing how our team plays around with the features that Respeecher has to offer. The voice quality is indistinguishable from the original to the layman - you would not suspect that it's voice synthesis.

A cloned voice is indistinguishable from the original

This myth is the opposite of the previous one - that a synthesized voice is so good that it is indistinguishable from the original. But as we said above, this is true for people other than sound professionals.

There are already several solutions on the market that specialize in voice fraud detection. In general, all of them use so-called voice biometric engines. In particular, the software is used to detect deceitful voice samples and protect user data from incorrectly granting access to a device or application.

Also, services like Respeecher develop unique watermarks that are embedded in the synthesized audio recording. They are indistinguishable to the ears of the average listener but easily detectable by sound engineers. The purpose is to make it easier to identify inappropriate content created using deepfake technologies.

Voice synthesis will never be affordable for anyone other than big Hollywood studios

Let's be honest, voice synthesis is unlikely to become available to video bloggers with a small following or private persons any time soon. However, access to this technology isn’t restricted to huge companies and media giants. We've worked with small businesses, educational organizations, and prominent YouTubers.

In addition to, and without the previous low-entry threshold, we are constantly working to democratize the synthetic media market. Not so long ago, we launched a Voice Marketplace, where small content creators can access voice cloning technology for a fraction of the cost.

In any case, whether you are a VTuber, a film company, or just curious about how Respeecher works, the use of our technology allows you to avoid having to invest in costly production items such as:

Additional dialogue replacement
Virtual character creation
Voice dubbing
Localization

If you have questions about how you can use speech-to-speech conversion technologies in your project, contact us today. We will gladly advise you on where to start, provide you with a demo, and a potential roadmap.

View full post

Debunking the 4 Most Common Voice Synthesis Myths

Does that involve deepfake technology? I heard it is often used with malicious intent

The cloned voice still differs from the original, and not for the better

A cloned voice is indistinguishable from the original

Voice synthesis will never be affordable for anyone other than big Hollywood studios