Respeecher CEO Discusses the Benefits of Voice Cloning in Film Production on The EFM Podcast

Written by Orysia Khimiak | May 26, 2023 12:48:13 PM

In today's world, technological advancements have played a pivotal role in shaping various industries, and the media and film production industry is no exception. With the emergence of artificial intelligence and voice cloning technologies, it is becoming easier to produce and edit films in a way that was never thought possible before.

Recently, Alex Serdiuk, the CEO of Respeecher, participated in a podcast titled Adopt and Adapt: Solutions for Smart Producing. Along with other EFM Startups Alumni, Maria Tanjala, Fiona Gillies, and Max Hermans, Serdiuk talked about the innovative production tools developed by Respeecher and the creative ways technology can be used to empower the future of film production.

During the podcast, Serdiuk discussed how Respeecher's technology works and how it can be used in film production. He highlighted that the technology could be used to create voiceovers in different languages, which would enable filmmakers to reach a wider audience. This is particularly relevant in today's globalized world, where films and TV shows are watched by people from all corners of the world. The ability to dub movies and TV shows in different languages could potentially increase the revenue generated by a film or TV show, making it more profitable for the producers.

Moreover, Serdiuk also talked about how Respeecher's technology can be used to replace dialogues in movies and TV shows. This could be particularly useful when an actor is unable to re-record their lines due to scheduling conflicts or other reasons. With Respeecher's technology, filmmakers could simply clone the actor's voice and replace the dialogue, making it seem like the actor has re-recorded their lines. This could save time and money in the film production process.

Another exciting use of Respeecher's technology is the ability to bring back the voices of deceased actors. This could be particularly relevant in films where the actor has passed away before completing their role. With Respeecher's technology, filmmakers could clone the actor's voice and complete their lines, making it seem like the actor is still present in the film. This could potentially lead to a revival of classic movies and TV shows, as well as the creation of new content featuring beloved actors who have passed away.

Alex Serdiuk's participation in the EFM Podcast highlights the growing importance of technology in the film production industry. The ability to clone voices and replace dialogues has the potential to revolutionize the industry, making it easier and more cost-effective to produce high-quality content. With the continued development of this technology, we can expect to see more exciting innovations in the film production industry in the future.

Listen to the full podcast episode here or read the transcription featuring Alex Serdiuk below.

AC Coppens: And now I welcome Alex Serdiuk, the co-founder, and CEO of Respeecher, the voice cloning tool for content creators. And by the way, they are based in Ukraine and have been in business since 2018, when Alex Serdiuk, Dmytro Bielievtsov, and Grant Reaber founded Respeecher. Since then, the team has been focused on improving voice cloning technology in several directions, which has already been applied in feature films, TV projects, video games, animation, localization, media agencies, and healthcare, to name a few sectors.

But tell me, are you a group of computer nerds obsessed with audio, as you wrote on your website?

You were frustrated by the robotic voices in our video on automotive navigation systems, and you decided to build something better. So from Shah Rukh Khan to Werner Herzog, the tool's lowering for cloning voices sounds great, and it's based on AI and machine learning.

Tell me, what is the range of its application, like from voice cloning, dubbing, with a focus on the film industry, but how can it be used in the film industry exactly?

Alex Serdiuk: Thank you so much, ACN. It's great to be here. Thanks for having me.

Yeah, when we started Respeecher, we understood that there were no technologies that could provide a synthetic voice of the quality that would meet Hollywood standards. So we wanted to create something that would go through PKS sound engineers in Hollywood Studios. Luckily, we succeeded.

It took us about a year before our technology appeared in big productions starting in 2019. And when we think about technology and its applications, we start brainstorming and listing use cases.

And those use cases currently number more than 100. And just a fraction of them are in the media industry. That would be the biggest fraction so far, but that's a starting point.

And the thing is, what the technology essentially does, not just ours, but all synthetic speech technologies, including text to speech, which is extremely scalable but not that good in terms of emotional control and stuff. Those technologies basically detach a human from their voice.

So that's something completely new that's being brought to this world, not just to this particular industry.

And when we detach a human from their voice, that means that one human can be in many places at the same time, or one human can do many voices, not being limited to their voice. They can do their young voice, they can do just a different voice, or they can apply accents. There are plenty of use cases that are being built on top of this.

In the content creation industry, we've been focused on use cases where we can optimize voiceover. So you can better allocate a load between voice actors. You can enable creatives to bring some voices that are no longer with us like we did in quite a few films and documentary productions. You can DH actors, and that's a common thing to DH them visually, but now you can do that with the technology.

But also, we have some amazing use cases in content creation that are not being seen right away, like animation for documentaries.

When we are interviewing a victim of a crime of some violence, they might want to keep their identity hidden. And what we used to do usually in the content creation community, we just put a contour light on them to hide them visually, but then we apply crazy voice morphing, so they sound very bad. And this voice morphing does not convey the emotional content they are saying.

And now we are doing that with documentary creators, but just converting their voice into a completely different voice. And that could be from our library or just from another person.

And that gives full performance kept in the content created, as well as it changes their vocal identity completely.

ACC: How do you actually work with your clients? How do they adopt the tool?

A.S.: We were lucky enough to start working with folks that were in charge of adopting new technologies in Hollywood, like Lucasfilm and Skywalker sound, so they were very open to exploring those opportunities and these new technologies.

We started with edge use cases when you cannot just get the voice at all, like making a young voice of Luke Skywalker or getting back the voice of Incelon Bardi, or de-aging an actor who just grew in like a key actor in a massive video game. And those use cases just show that without technology you cannot achieve this result.

You cannot bring young Luke to millions of fans of Star Wars without applying the technology and voicing personators just physically cannot deliver the same speaker identity as the technology can.

And once we hit these early use cases and we started to scale the technology, we found out that it can be used in pre-production, not just in post-production. That can be something that you build your script on.

So we have cases like this when you're making a documentary about, say, a famous singer and we are working on one right now, you rely on technology as a creative tool, because you know what technology can give you.

You know what limitations as well as capabilities of the technology and you build your story plot to some extent based on technology and that's our project that excites us a lot.

ACC: This is great relying on technology as a creative tool. So, but new technologies, I mean we're talking about smart producing, enabling efficiency, giving a solution to certain problems obviously, but also as a creative tool, which is great.

But this comes especially in your field with risks of abusing the technology. So there are issues that are always coming when we talk about AI and machine learning and etc. So, I know you have a strong commitment to the ethics of voice cloning. But what does that mean concretely?

What do you do to actually be able to affirm that you are ethical?

A.S.: Yeah, even before we started the company and started to provide technology or pitch technology to content makers, we built a very strong ethics policy and our ethics policies start with requiring permission from a target voice, the voice that needs to be cloned using our tool.

So this permission should be very clear in writing and we should see it. That's something we require in all the projects we are part of.

And in many cases, we even had to educate the industry and we had to bring all those folks together to actually discuss like IP owners, voice actors, content producers, lawyers, guilds and associations, and even those folks who are in charge of unions and stuff.

So this is a very new tool and like all tools, it has amazing applications and it has scary applications.

And we are in charge of empowering amazing applications and protecting not just our technology but society in general from misuse of technologies like ours, because even if we can protect the future from misuse, like 100%.

We can say that technology has become a community and in two years, it would be quite easy to create the same quality of the sound Respeecher was able to create a year ago.

ACC: Your technology also brings some added value in terms of inclusivity. I mean, we were talking about the democratization of the technology to let sound professionals and creators all over the world access your tool.

So, I mean, we do a lot of different languages to be used, or I mean like there are a couple of ways to talk about inclusivity. Can you tell me more about this? How do you see yourself being inclusive and giving access to this technology?

A.S.: Yes, when we started, as I described before, we were obsessed with the quality of the audio and we traded off all the usability and that means that we ended up having this extremely heavy theme.

We used to operate manually, which required a lot of resources, and still, it's quite heavy and it was available only for big studios because the budgets for using the technology hit very high numbers just because it's quite new, it was not optimized back then, and it was not fair that only big studios, like Hollywood Studios with big budgets, have access to this amazing tool, and we always dreamt about democratizing it to let any small creator who does not have such a huge budget for voiceover in their piece of content, or they just don't have any budget at all and they want to voice over themselves.

They cannot compete with big studios because they cannot put that amount of money into voiceover. And now what we introduced earlier this year, Voice Marketplace, a library of voices that basically enables small creators to use just one actor and make them sound in very different voices or just voice over whatever they create themselves and bring their content to the same level of voiceover quality as big studios and start competing with ideas, not budgets, and that's the future I want to live in when people in the creative space start to compete with ideas.

And talking about inclusivity, it's not just about the democratization of the technology, it's also about what we see in the development localization space, for instance.

Content owners, like studios that produce original content, want to have control over dubbing and localization, as well as want good diversity there. The diversity they put initially to be represented in dubbing and localization.

We talk about sexual diversity, sexual minorities representations, and ethnic minorities representations, and it's quite a hard task to achieve when you dub a piece of content into 52 different languages in some regions where this diversity actually does not exist in the region and technology becomes an enabler there.

So we see how we can help them do this representation properly, which is quite exciting for us.

ACC: Do you have any new features coming up that you can reveal to us or tease us now?

A.S.: Yeah, just recently, maybe three or four weeks, we finally got very good results from accent conversion. And we've been working on the accent for quite a while, but the way how the speaker works, we never released something before we are fine with the quality we've produced, and given that our internal quality standards are very high, it could take us long to get a new feature introduced.

But this accent conversion feature works very excitingly, so we basically can apply accents for all the voices we have in the library.

Additionally, we introduced real-time voice conversion this year. It's not available through the Voice Marketplace for small creators yet, but along with that, we opened a new program at Respeecher called Small Creators Program.

So if you have a case where you would really benefit from utilizing voice conversion technology, voice cloning technology, and just a library of voices cannot satisfy you because you need to introduce particular voices and you have a very limited budget, we are still open to review that and we take from one to two projects per month.

ACC: This is great. So listen, you have been now on the market for nearly five years, and you have seen a lot of things happening in the ecosystem. And I would like to ask you, what are the next smart producing tools you see coming up or you would like to see coming up?

A.S.: When we work on different projects, we often go hand in hand with visual synthesis technologies, that's a common thing when you would apply Respeecher to change the voice and you would apply visual synthesis to actually drive the face of a character. And we see a lot of progress in this industry. That means that those technologies would also become monetized and also would become more accessible.

We see a lot of technologies that enable content creators to just use an iPhone or just a smartphone to create a quality of content that's basically not different from a big studio setup.

And that means that the price of production would be decreasing and a lot of text is there.

I'm not very experienced in working with those texts, but I see from our clients that they started adopting that.

And basically, they optimize processes. So there are more and more texts to optimize the process, the process of pre-production, the process of storyboards, making the process of shooting and filming, and the process of post-production.

And all this is under heavy attack from optimizing tools right now. And we see those that would stick. And those that would stick would actually be democratized very soon. This is happening very fast now.

ACC: Yes, Alex, this is what you were also seen before the timeframe where technology is used as a commodity is actually reducing at an ever-reduced faster rate now. So this is really a great conversation. Thank you so much for having been with us. I think I will keep in mind really that yes, technology allows more and more processes to be optimized, but in the end, rely on tech as a creative tool. Thank you very much.

A.S.: Thank you so much for having me. And please keep standing with Ukraine.

This year, you will see more articles about the events that Respeecher participates in, right here in our new News section. Subscribe to our newsletter to stay up to date.

View full post