by Alex Serdiuk – Jun 15, 2021 6:43:48 AM • 8 min

What Are Deepfakes: Synthetic Media Explained

•••

Deepfakes are one of the most unique phenomena of the last five years in the world of synthetic media. Many are afraid of this technology, others have figured out how to put it to productive use. It's time to figure out what deepfakes are exactly, and what makes them so significant in a world of modern media and generative AI technologies.

What is synthetic media and what makes up its market landscape?

Synthetic media is an AI-generated or AI-modified content. With traditional media, people relied on broadcasting networks (radio and TV) and social networks to create and distribute their content.

This method of receiving information came with certain restrictions, which, of course, were significantly weakened by the emergence of social networks.

With synthetic media, content creators can create content with a high level of quality that was previously only available to major studios with massive budgets.

AI content is cheaper and easier to scale. However, this democratization of content creation, facilitated by AI voice generators and voice AI technologies, comes with ethical considerations, notably the differentiation of AI-synthesized content from genuine content.

Respeecher, a pioneer in voice cloning technology is using watermarking technology that makes it simple to distinguish Respeecher-generated content from other audio, even if it is mixed in with other sounds. As a key player in the voice cloning market, we are taking ethics very seriously, and that's why we're following a strict voice cloning ethics code. Find out more on the Respeecher FAQ page.

While the tech community and policymakers formulate proper regulation, movie studios, video bloggers, and the education sector are reaping the benefits of this technology.

The current landscape for the synthetic media market has been covered in detail by a recent Samsung Next study. Here are the key media sectors disrupted by this emerging technology:

Speech and voice synthesis
Music & sound synthesis
Image synthesis
Video synthesis
Game content synthesis
Digital avatar synthesis
Mixed reality synthesis
Natural-language generation

Keep in mind that the majority of synthetic media use cases are run on deepfake technology.

Deepfakes in a nutshell

In short, deepfakes are artificial intelligence-based images and sound synthesis techniques. They are used to join and overlay existing images, videos, and soundtracks onto original content.

In most cases, deepfakes use generative adversarial neural networks (GANs) to create this type of content. One part of the algorithm learns from a real media object. It creates an image that literally "competes" with the second part of the algorithm until it starts confusing the generated copy with the original.

Here's how deepfakes work in three key steps (using video production as an example):

It begins with inserting the original video or voice of the target character into a neural network. Autoencoder and GAN algorithms go to work by analyzing the subject's facial expressions and main features.
Combining an autoencoder with GAN allows the algorithm to generate fake images until it can no longer distinguish them from the original.
The video with the stunt double is then inserted into the neural network. After having analyzed the facial characteristics of the target subject, the network can easily generate a deepfake. The target subject's face is then overlaid onto the video of the stunt double.

In voice cloning, where other algorithms are used, the process itself practically does not change.

Deepfake use cases

The most common examples of deepfakes are videos in which the authors swap people's faces with other actors. You can find many deepfake cosplays of Hollywood actors like Tom Cruise or Arnold Schwarzenegger on the web. Less often, there are genuinely unique projects where the technology is used at the level of an art form.

One such project is the resurrection of Vince Lombardi for the Super Bowl. Respeecher created Vincent's speech for this project, and you can appreciate how brilliant the final product turned out and how it showcases the potential of AI voice cloning to breathe life into historical figures.

Here's what Abigail Savage, Sound Designer and Actress that starred in Orange Is the New Black, had to say about Respeecher's AI-synthesized voice cloning:

Respeecher is a remarkable tool for Sound Editors. It delivers very high-fidelity recreations of a target voice, with transparent performance-matching of its source. It blows text-to-speech out of the water! The effect is uncanny and incredibly effective and I can imagine a whole slew of uses going forward.

The range of deepfake use cases is not limited to video production. It spans multiple industries, from marketing to museum and education. Here you can find more examples using deepfake in marketing projects.

Another industry widely utilizing deepfake technology is education. British advertising holding WPP are revolutionizing training programs through AI voice cloning, employing neural networks to create virtual mentors.

Fifty thousand people learn the basics of marketing through video courses with a virtual mentor who lectures in three languages: English, Spanish and Chinese. At the same time, the mentor addresses each employee personally by name.

The most prominent architecture project to date is NVIDIA's GauGAN. Its neural network turns sketches into actual images. The program helps architects collect building designs from drawings and game designers to create levels in games faster.

Investigative journalists use deepfakes to change the appearance of sources who want to remain anonymous in reports. This technique, for example, was used by HBO when creating the documentary film, "Welcome to Chechnya."

And one of the most famous examples is Wireless Lab's Faceapp application that changes people in photos: their gender, age, appearance, and ethnicity.

Benefits of using deepfake in audio and video production

Using deepfake can significantly reduce production timelines and costs while helping to scale it to the next level. In short, here are a few of the most critical selling points for production studios:

Freeing up the actor's time. Artificial intelligence systems can generate entire scenes with actors without having to bring them in for live filming sessions.
Post-production savings. Re-recording and additional dialogue replacement (ADR) can be completed without involving actors.
Digital avatars instead of living people. Once created, virtual characters can run customer service, a reception desk, or act like a chatbot, increasing customer engagement.
Content localization. Voice AI technologies allows you to quickly duplicate audio tracks in many languages while preserving an actor's original voice. The final result sounds as if the actor speaking in the foreign language is doing so with the native tongue. Respeecher built an audio version of the super resolution algorithm to deliver the highest resolution audio across the board, even in cases when the client doesn’t have high-res sources available. Learn more about it by downloading this audio super-resolution whitepaper.

If you're looking to learn more about deepfake technology, we encourage you to take this LinkedIn course: Understanding the Impact of Deepfake Videos.

Conclusion

It's easy to get overwhelmed by the onset of this era of synthetic media, feeling like it cannot be stopped. But if you can manage to ride the trend and use these new opportunities offered by generative AI technologies to develop your projects and business, you'll be well ahead of the curve for years to come.

If you need to synthesize an original voice, rejuvenate a voice, or transmute a voice into another language - contact us today and we will help you identify the best options for your project.

FAQ

Deepfake technology uses AI and generative adversarial networks (GANs) to create synthetic media, such as swapping faces or voices in video production and audio content.

AI voice cloning generates realistic human voices using neural networks and deep learning. It analyzes voice data to replicate tone, pitch, and inflection for use in media and audio content.

Deepfake technology is used in AI-generated content, including video production, training programs, marketing, education, and even creating virtual characters for customer service.

Synthetic media, like deepfakes and AI voice cloning, enables immersive training experiences, creating virtual mentors and personalized learning in multiple languages, enhancing education and professional development.

The main ethical concerns with deepfake technology involve misinformation, privacy violations, and unauthorized content creation. It’s important to use ethical AI to prevent malicious misuse and ensure transparency.

AI voice cloning reduces costs by eliminating the need for actors in voice recording, enabling faster post-production, and allowing content localization with the original actor’s voice across languages.

Deepfake technology benefits industries like film production, marketing, education, journalism, and customer service, where it enhances content creation and provides cost-effective solutions.

Respeecher ensures ethical AI voice cloning by following a strict code of ethics, using watermarking technology to distinguish AI-generated content and promoting transparency in synthetic media creation.

Glossary

Deepfake technology

A form of AI-generated content that uses generative AI and deep learning to create realistic, synthetic media, including video production and voice synthesis.

AI voice cloning ethics

The principles governing the responsible use of AI voice cloning applications, ensuring ethical AI in media, protecting privacy, and preventing misuse in deepfake technology.

Generative Adversarial Networks (GANs)

AI models used in deepfake technology and AI-generated content to create realistic media by training on data, driving synthetic media innovations.

Synthetic media in education

Utilizes deepfake technology, AI voice cloning applications, and AI-generated content to create immersive, personalized learning experiences and training programs.

Content localization with AI

Uses AI voice cloning applications and deepfake technology to adapt AI-generated content into multiple languages, maintaining authentic voice synthesis and cultural relevance.

Virtual mentors in training programs

Leverage deepfake technology and AI voice cloning applications to create personalized, scalable learning experiences with AI-generated content.

Alex Serdiuk

CEO and Co-founder

Alex founded Respeecher with Dmytro Bielievtsov and Grant Reaber in 2018. Since then the team has been focused on high-fidelity voice cloning. Alex is in charge of Business Development and Strategy. Respeecher technology is already applied in Feature films and TV projects, Video Games, Animation studios, Localization, media agencies, Healthcare, and other areas.