Crafting the Future of Digital Avatars: Respeecher, Pinscreen Joins Forces with Motus Lab

Audio article by Respeecher

Picture this: A virtual avatar that looks and sounds exactly like the person it represents but can be driven by the movements and voice of anyone. In this case study, we'll examine Respeecher’s partnership with Pinscreen and the Motus Lab team at the University of Sydney to create a virtual avatar for Mike Seymour, a digital humans researcher and leader of Motus Lab.

The Challenge

Our emotional connection with people we interact with is largely determined by two factors, their face and their voice. Pinscreen, a tech startup founded by Hao Li, and Motus Lab are leaders in the development of AI-based technology to allow one actor to drive the face of another, fully inhabiting another person. One use of this technology is to dub content without creating a disconnect between face and voice. But the visual part of a virtual avatar is only half the story. Equally important is allowing one person to drive the voice of another.

The Solution: Respeecher's Voice Cloning Technology

By converting the voice of Grant Reaber, Respeecher’s Chief Research Officer, to sound like Mike Seymour, the team managed to create a stunningly lifelike virtual avatar, with both Mike’s face and voice being driven by Grant’s performance.

How was this achieved? The team shot the demo with multiple cameras tracking Grant's facial expressions to ensure the visual model had an accurate representation of them. No special capture sessions or training on Grant’s appearance was needed. Meanwhile, Respeecher trained a model on Mike's voice capable of converting Grant’s (or anyone’s) speech to Mike’s. Though Grant and Mike have very different speaking styles and accents (Grant is American and Mike is Australian) the effect is compelling.

"The research teams are rapidly advancing the art of re-enactment to deliver a faster and more faithful rendition of the original actor’s actions in a way that is most true to their performance," said Mike Seymour.

The Result

The collaboration between Respeecher, Pinscreen, and the Motus Lab team was nothing short of magical.

"Today, while AI is rapidly advancing, it is great to see two ML companies that started years ago still expanding and servicing the M&E industries today," noted Seymour. Despite challenges from the war in Ukraine and working through extreme conditions, Respeecher's team continues to service partners and productions worldwide.

Conclusion

This case study is an example of how AI and machine learning are transforming the entertainment industry, particularly in voice and speech technology. Respeecher's collaboration with Pinscreen and Motus Lab demonstrates the incredible power of combining visual and audio re-enactment to create more authentic and engaging experiences for audiences.

FAQ

A digital avatar refers to a virtual version of a person, generally in the form of a 3D character, and it looks just like real humans. It represents their features, such as mimicry or voice. Avatars can be used in entertainment, games, or virtual space.

It enables Respeecher's voice cloning technology to allow a virtual avatar to speak with somebody else's voice: a model trained to generate the voice of a target individual allows anyone to drive such an avatar to speak, independently of accent and/or speaking style.

It uses AI to generate virtual avatars that can perform both face and voice re-enactment; this means one can create very realistic digital humans that can reproduce the voice and expressions of a person. The driving machine learning for entertainment models perform speech-to-speech conversion and face re-enactment.

Yes, Respeecher's voice cloning can change the voices to different accents. For example, it changed Grant's American voice to sound like Mike's Australian voice, showing how flexible and precise the voice cloning technology is.

This voice cloning and AI-driven virtual humans technology allow for more authentic and seamless dubbing, re-enactment, and creation of digital avatars, benefiting the entertainment industry with realistic digital experiences that enhance storytelling and audience engagement.

It is in collaboration with Respeecher, Pinscreen, and Motus Lab in developing highly realistic digital avatars by putting together voice cloning technology and face and voice re-enactment to create life-like avatars for entertainment, allowing for a more immersive and accurate virtual human experience.

Glossary

Digital avatars

AI-driven virtual humans using voice cloning technology and machine learning for entertainment in creating realistic digital experiences of face and voice re-enactment.

Voice cloning

Respeecher's voice cloning technology that empowers speech-to-speech conversion to represent a person's voice and provide AI-driven virtual humans with realistic digital experiences for digital avatars.

Speech-to-speech conversion

This is the process whereby one person's speech is transformed into another's voice, enabled by AI-driven virtual humans and realistic digital experiences using digital avatars.

Reshaping Digital Avatars with
Respeecher's Technology for Cloning Voices

The future of digital avatars is shaped by the path-breaking innovations in AI in entertainment and machine learning for entertainment. The technology for cloning voices from Respeecher voice cloning brought virtual avatars to new heights with face and voice re-enactment combined. Collaboration with Pinscreen and Motus Lab brings realistic digital experiences to take AI-driven virtual humans into the realm of their being alive, never experienced before.

Ensuring Seamless Merging of Voice
with Visuals

In the realm of virtual avatars, the way one looks represents only half the equation, at most. For this digital avatar to seemingly spring to life, it needs the addition of the right voice to fit the person portrayed. This challenged Respeecher, Pinscreen, and Motus Lab to develop not only a product but also integrate face and voice reenactments into one. For that to work in film and other sectors of the entertainment industry, not only does the virtual avatar have to look like its real-life model, but its speech should sound similar too. The task was to make the voice of the AI-driven virtual human go with the performer’s facial expressions and movements to carry out a transition from voice to visual without hiccups.

Key to such a solution is voice cloning technology. They needed to find a way for a single actor to not only control the avatar's facial expressions but also generate a voice that felt true to the character it was representing. This means voice performance could be transferred from one person to another without sacrificing the authenticity of the original voice. The solution required the use of sophisticated AI tools and machine learning for entertainment to merge speech-to-speech conversion with face and voice re-enactment, thus creating a virtual avatar that looked and sounded like the person it was intended to represent.

Show More
Respeecher's Voice Cloning
Technology in Action

The breakthrough came with Respeecher's voice cloning technology, enabling the virtual avatar to take on the voice of Mike Seymour, even as it was being controlled by Grant Reaber, Respeecher's Chief Research Officer. This technology uses speech-to-speech conversion, where one person's speech is taken and reproduced to sound like another's, keeping key features of the original voice. This helped Respeecher train a model on Mike Seymour's unique speaking characteristics, so Grant's voice would transform into speaking like Mike, with his speech patterns, intonation, and accent.

Despite the challenges brought in by the differences in accents, Grant having an American one and Mike having an Australian one, results turned out great. This was a huge leap in AI-driven virtual humans, as it showed that a virtual avatar could now seamlessly represent both the visual and auditory aspects of a real person, regardless of the actor behind it. The voice and face synchronization was recorded using several cameras tracking the performer’s facial expressions and movements. The AI-driven virtual humans would have been able to perform in a manner pretty much indistinguishable from the original actor's and brought depth and authenticity to the digital experience of the avatar.

Show More
A New Lease of Life for Digital Avatars
and AI-Driven Virtual Humans

Respeecher, Pinscreen, and Motus Lab came together to create the virtual avatar, which was just visually on point but also featured a voice which perfectly matched with that of the original. That happened to become a point of face and voice re-enactment for digital humans with integrated voice cloning technology in entertainment. This new capability opened the door for a host of possibilities for entertainment, particularly for film, TV, and gaming, where these avatars had been in dire need and were much needed. Thanks to machine learning in entertainment, these teams made refinements that not only created more efficient avatars but improved their quality. This yielded a realistic digital experience that was arguably indistinguishable from reality, whereby the actors' performances could be recorded and duplicated with incredible fidelity, sans any specialized training or bespoke capture sessions. Such technology also brought new dimensions to the work of dubbing and voiceovers, being able to make it more immersive, where voices of actors could dub characters in foreign languages while maintaining authenticity.

Show More
The Future of AI in Entertainment
and Digital Humans

This case study only overviews changes brought about in the entertainment business by AI and machine learning. By combining Respeecher's voice cloning technology with face and voice re-enactment, Pinscreen and Motus Lab proved it is possible to create AI-driven virtual humans that would be really indistinguishable from the real actors. The collaboration has taken not only another step in developing the technology standing behind digital avatars but also moved up the quality benchmark for realistic digital experiences. As AI in the entertainment sector further develops, such creation of more realistic virtual avatars will be even more common to offer new leeway to the filmmakers, video game developers, and content creation professionals to amuse audiences in the ways they can only dream about. This is the beginning of a new era where virtual avatars are no longer just passive images but now fully realized, dynamic characters with real voices, ready to bring stories to life in more immersive ways than ever before.

Show More

Did you like this content?

Dubbing and Localisation

Transcending Boundaries: How Voice Cloning is Revolutionizing Creative Expression for Transgender Artists

How Respeecher Helped a Person with Ataxia Preserve Their Voice

Crafting the Future of Digital Avatars: Respeecher, Pinscreen Joins Forces with Motus Lab

The Challenge

The Solution: Respeecher's Voice Cloning Technology

The Result

Conclusion

FAQ