Our unique technology can change your voice to that of another person (e.g., a celebrity) while preserving all the subtle detail of how you say what you say.
About the system
We leverage recent breakthroughs in the field of machine learning called deep learning which allow artificial neural networks to produce high quality synthetic speech. Up until now, these techniques have mainly been used for text-to-speech conversion. Because text contains very little prosodic information, this results in a rather monotone output. By doing speech-to-speech, we circumvent this problem, copying the intonation of the input speech to the output.
Why the world needs voice conversion
With our technology, movies can be dubbed using the voices of the original actors, providing a big wow factor. Relatedly, production companies often need to record short bits of dialog after principal photography (ADR), and with our technology they can fake it when the actor is no longer easily available. Finally, our technology will reinvigorate porn parody films with famous voices.
Any actor can speak with a famous voice. The talent can get ill and even die. But their voice can live on. Audiobooks narrated in the author’s voice. Tribute concerts for singers who have passed away.
A whole call center can speak with one good voice. It could be the voice of a celebrity or of the business owner. Or call centers can switch between voices, targeting them to customers.
Entertainment such as karaoke, new highly immersive VR games as well as traditional online games will need voices that our technology is poised to provide.
Personalized synthetic voice for people with speech problems
These samples are generated by our current prototype trained on the CMU Arctic dataset. A trained system takes a file spoken by a source speaker (“Source” column) and produces a result (“Target, converted” column), as if it was spoken by the target person (“Target, true” column). Note that the true target samples are given here just for comparison; the system only uses the source voice for conversion. Neither the source nor the target samples below were ever seen by the model during training.
|Source||Target, true||Target, converted|
Former CTO at IBDI. Publications at Phys Rev and CCE. Expertise in machine learning, dynamical systems and distributed computing. Experience in building, guiding and managing research and software development teams.
PhD candidate in math at Carnegie Mellon. PhD from NIP Aberdeen in philosophy. Accepted at the NextAI incubator (for a previous speech tech project). Expertise in applied math and deep learning.
BSc degree in systems engineering from Kiev Polytechnic Institute.
Former CEO at IBDI. Successfully built several tech-centered companies from scratch. Developed sales processes and built tech and business teams.