The Rise of Ethical Voice Cloning in the Deepfake Voice Wars

Written by Anna Bulakh | Apr 12, 2022 2:00:00 PM

Deepfake voice technology has experienced a dramatic evolution over the past decade. These advancements have given way to the technology’s growing popularity in multiple industries including entertainment, movies, marketing, healthcare, and customer service. With the rise of synthetic media, deepfake voice generator has become increasingly sophisticated and accessible, allowing for highly realistic audio manipulation and voice synthesis.

Such rapid growth and demand are always accompanied by active discussions about the ethical use of new technologies. The notorious fame of deepfake videos did not contribute to the debate. Nevertheless, today, voice cloning software is considered safe and ethical. This article will explain how and why. With increasing concerns about privacy, consent, and authenticity, addressing AI ethics becomes paramount in the development and deployment of such generative AI technologies.

The rise of AI voice cloning technology

Not long ago, AI voice cloning technology got its start with simple speech synthesizers — programs capable of converting text to human speech. Even today this generative AI technology is one of the most widespread. For example, Google Translator can read a text in a foreign language after translating it.

Text-to-speech voice conversion reached its peak in products like Descript's Overdub — ultra-realistic text-to-speech voice cloning widely used in podcasting and radio. Services like Overdub help create pieces of audio content so that producers never have to reach out to voice actors.

After realistic voice generators, the AI deepfake voice technology made its way onto the market. Using machine learning and AI algorithms, Respeecher was able to create a unique technology capable of cloning one person's voice into the voice of someone else. As the demand for synthetic media grows, innovations like AI voice cloning offer new possibilities and challenges in various industries. We’ve examined in detail how this AI voice cloning technology is changing content production for the better in a series of articles on our blog:

In short, you can convert the voice of any person (gender does not matter) into the target voice of a person using an AI voice generator. There is only one requirement: the algorithm requires an hour-long, high-quality recording of the target's voice, allowing the AI to generate its model correctly. Once the model has been generated, you can clone unlimited speech to your target voice without sacrificing the source voices’ intonations, cadence, particular vocal emphasis, etc.

In case the original audio recording does not have the best quality, especially if they are old, Respeecher built an audio version of the super-resolution algorithm to deliver the highest resolution audio across the board. Want to find out more? Download this whitepaper on audio super-resolution with Respeecher.

Ethical doubts around the voice cloning process

As you can see, there is nothing unethical about voice cloning technology itself. And although it uses the same AI technology as video deepfakes, there are significantly fewer examples of defamatory deepfake voices.

However, it is becoming more common for deepfakes to combine audio and video with the goal of deceiving as many people as possible. Here are the most famous examples.

Voice scammers

Every human being’s voice is unique. This is why some government and financial institutions use voice authentication to access private assets. In everyday life, most people also rely on their natural ability to distinguish the voices of friends and family when they cannot see them.

All this creates ideal circumstances for those with bad intentions to gain access to people's personal information or money.

Law enforcement agencies in many countries are busy establishing proper regulations for producing and using artificially synthesized voices. The United States has already passed a law called The Defending Each and Every Person from False Appearances by Keeping Exploitation Subject (DEEP FAKES) to Accountability Act in 2019.

Fake news

In 2020, fake news was estimated to have cost the global economy up to $78 billion. In 2019, cybersecurity company Deeptrace reported that the number of deepfake videos circulating online had surpassed 15,000. And this number would continue to double each year.

Deepfakes are widely used in the political arena — to mislead voters and manipulate facts. All this can create financial risks and damage the very fabric of our society.

Controversial media applications

Aside from malicious intent, some deepfake applications in media don’t quite qualify for compliance with ethical standards.

One such example would be the 2021 Anthony Bourdain deepfake controversy.

A film detailing the life of Anthony Bourdain encountered backlash after the director disclosed that the producers used deepfake voice technology. Some of his quotes were narrated using a cloned voice due to not having access to the original audio recordings.

Naturally, this raised concerns in the community. With the ability to alter historical facts, there is a grave need to ensure the production of ethical voice cloning. In this regard, the AI engineering community is constantly working to improve the recognition of audio and video deepfakes.

Be that as it may, there are many more positive examples of utilizing deepfake voice technology than negative ones. Here are just a few.

Recent examples of ethical AI voice cloning

Here at Respeecher, we take AI ethics very seriously. That's why we are committed to following a strict ethical code for voice cloning.

Here are just a few projects from our portfolio. As you will see, every single one was created in close cooperation with the copyright holders and families of those deceased (in case concerns arise over a project’s use of a voice).

We recommend taking a quick look at these stories:

Respeecher synthesized a younger Luke Skywalker's voice for Disney+'s The Mandalorian
Respeecher Gives Voice to Michael York in Healthcare Initiative
Manuel Rivera Morales' Voice Re-created by AI for the Olympic Games
Revealed: How Respeecher Took Part in Creating a Digital Vince Lombardi for Super Bowl LV

The titles speak for themselves and include resurrection projects and voice cloning for actual living celebrities and movie stars, showcasing the capabilities of ethical synthetic media.

As you can see, there's no inherent evil in a deepfake generator in and of itself. However, there are those who intentionally disregard responsibility or use generative AI with malicious intent.

The future of voice conversion as Respeecher sees it

With developments like the recent Respeecher and Veritone partnership or voice cloning making its way to Hollywood, it's evident that voice cloning is here to stay. As pioneers of the technology, we want to ensure ethical voice cloning applications.

In addition to purely technical measures, which include the development of algorithms for deepfake identification and voice watermarking, we are working to democratize and educate the market.

Making the AI voice cloning technology legible and accessible to as many businesses and creative projects as possible through our AI voice generator will protect the community from scammers or unethical use.

Contact us if you're looking for a trustworthy partner for your media, marketing, or healthcare initiative. We are always eager to help.

View full post