by Anna Bulakh – Aug 9, 2022 8:07:14 AM • 8 min

What Is Singing Voice Synthesis and Is It Even Possible?

•••

With advancements in voice cloning, the ability to synthesize vocals to sound like another person, or sing with perfect pitch in different languages, is no longer science fiction. It is now possible to vocalize text in any tone of voice, including that of a child. But what if you want to synthesize… singing? Is AI singing possible? Let’s find out.

 

 

What is singing voice synthesis?

Singing voice synthesis (SVS) is a method of generating a singing voice from musical scores with lyrics using computer models. 

Singing synthesis has been developing since the 1950s and, like text-to-speech, revolves around two paradigms: statistical parametric synthesis, using statistical models to reproduce the features of a voice, and unit selection, when snippets of vocal recordings are recombined on the fly. Thanks to recent advances in the voice AI technology, maestros can listen to a song immediately after composing it, no recording necessary 

Modern SVS models can generate the natural singing voice of a singer in any language using vocals from the original score and recordings of singers in the target languages. This is called cross-lingual singing voice synthesis, which produces remarkably realistic AI voices.

In recent years, the following technologies have been used to achieve SVS:

  • generic deep neural networks (DNN)
  • convolutional neural networks 
  • recurrent neural network with long-short term memory (LSTM)
  • generative adversarial networks (GAN)

Use cases for singing voice synthesis

Singing voice synthesis technology, powered by AI-generated voices, allows musicians and singers to instantly know how their written music will sound. It’s no longer necessary to go through the process of recording a piece of music, investing all the time, money, and resources that go into it. And no need to hire a team to assist with recording sessions. 

Another critical use case is creating music for games and other projects that demand high degrees of audio support. Recording songs with real artists is extremely expensive for video game producers. Singing voice synthesis, powered by gen AI, allows smaller indie devs to produce songs from musical scores and text using existing voices.

Artists that want to reach a global audience with their message and provide support to different groups of people all around the world can also benefit from cross-lingual singing voice synthesis. Now, with the assistance of AI singers, they have an inexpensive means of distributing their message in any language.

How does cross-lingual singing voice synthesis work? Respeecher’s example

When synthesizing the singing voice of a particular performer, specialists begin by using samples of their vocals.

In total, about an hour of an individual’s vocals are needed to construct an initial model, and 10-15 minutes of recording will be used for the synthesizing process. This meticulous approach ensures the creation of a realistic AI voice that accurately reflects the nuances and characteristics of the original performer's singing style.

These recordings are loaded into a neural network, which then generates a voice, taking into account all possible nuances. The result is a synthesized voice that is almost indistinguishable from the original.

This is how Respeecher implements cross-lingual singing voice synthesis:

On the fourth anniversary of famous Swedish musician Tim Bergling, known professionally as Avicii, one of his best-known collaborators, Aloe Blacc, paid tribute to the artist. He performed and recorded Avicii’s hit “Wake Me Up” in the English, Mandarin, Spanish, Italian and French languages using AI voice synthesis. In doing so, his aim was to allow more people all around the world to appreciate Avicii’s talent in a deeper way. 

Since Aloe’s aim was to sing the song flawlessly, not only in English but also in Mandarin, Spanish, Italian, and French, he was going to need some technological help from singing voice synthesis experts.

In order to facilitate the accuracy of the lyrics while also correctly following the natural beat of the song, Aloe Blacc turned to Respeecher and Metaphysic.ai.

Firstly, Aloe Blacc recorded a video of himself singing “Wake Me Up” in English. In order for him to also sing in Mandarin, Spanish, Italian, and French, the Respeecher team took recordings of other singers performing the song in these languages and applied them to Blacc’s voice using gen AI technology.

Then, Metaphysic.ai was tasked with lip-syncing Blacc’s vocal movements, making his mouth appear natural when singing in various languages. This synchronization process, combined with the use of AI-generated voice technology, ensured a seamless and authentic performance across different linguistic renditions of the song.

In a Nutshell

Thanks to singing voice synthesis technology, artists can  “sing” in as many languages as they want. AI speech-to-speech technology clones an actor’s voice and reproduces it in such a way that the same material can be performed in a foreign language using the same voice. All you need is a minimum of one native speaker for the language you intend to reproduce your content for.

We encourage you to get in touch with Respeecher for a brief consultation regarding the use of our technology and scaling singing voice synthesis to meet the demands of your use case.

Anna Bulakh
Anna Bulakh
Head of Ethics and Partnerships
Blending a decade of expertise in international security with a passion for the ethical deployment of AI, I stand at the forefront of shaping how emerging technologies intersect with national resilience and security strategies. As the Head of Ethics and Partnerships at Respeecher, I focus on guiding ethical AI development. My role is centered around promoting the responsible use of AI, especially in synthetic media.
  • Linkedin
  • Email
Previous Article
Respeecher Mates: Bogdan Belyaev on the Intersection of AI & Music, Luke Skywalker's AI Voice and the Invasion of a Hometown
Next Article
Opportunities that Voice Cloning Brings to Voice Actors
Clients:
Lucasfilm
Blumhouse productions
AloeBlacc
Calm
Deezer
Sony Interactive Entertainment
Edward Jones
Ylen
Iliad
Warner music France
Religion of sports
Digital domain
CMG Worldwide
Doyle Dane Bernbach
droga5
Sim Graphics
Veritone

Recommended Articles

The Role of AI Voice APIs in Building Accessible Smart Cities
Oct 25, 2024 | 9 minutes read

The Role of AI Voice APIs in Building Accessible Smart Cities

As urban environments grow smarter, the role of AI voice APIs in enhancing accessibility becomes increasingly critical. Smart cities leverage technologies like AI, the ...
# Respeecher Voice Marketplace
AI Voice Cloning for Historical Preservation: Bringing the Past to Life
Sep 20, 2024 | 8 minutes read

AI Voice Cloning for Historical Preservation: Bringing the Past to Life

AI voice cloning, a cutting-edge technology that uses artificial intelligence to replicate human voices, is transforming various industries, including historical ...
# Respeecher for Business