by Anna Bulakh – Jun 12, 2024 6:42:30 AM • 8 min

Speech Synthesis Is No More a Villain than Photoshop Was 10+ Years Ago

•••

Modern technologies deliver many benefits, completely transforming many areas of our lives. However, by changing the way we perceive reality, modern technologies sometimes create more problems than they solve.

They can fake, fool, and disrupt moral norms. Today, we will dive into two popular and controversial generative AI technologies that are perceived in completely different ways.

A Brief History of Adobe Photoshop

Adobe Photoshop is an incredibly rich graphics editor, but it hasn't always been that way. 28 years ago, it was completely far from the functional and powerful tool that we know and love today.

The original code for the program was developed by two brothers (with a little help from their father). In 1980s, John Knoll was working in the special effects department of Industrial Light And Magic at Lucasfilm (where Star Wars was filmed).

His brother Thomas studied image processing in the Image Processing Department at the University of Michigan.

One day, Thomas was writing a program on his Macintosh Plus, but the monochrome computer display could not display Grayscale. In order to fix this, Thomas decided to write his own program - Display.

After seeing his brother's development, John convinced Thomas to turn Display into a full-fledged graphics editor. In 1988, the brothers bought a new Macintosh II computer, and for the next six months, they were engaged in creating a program that could work, among other things, with color images.

Thomas wanted to change the first name of the software to ImagePro, but this name was already taken. The modified version of the program was called Photoshop.

The first 200 copies of Photoshop were released and sold under the Barneyscan brand. And already in 1989, John had successfully presented the new graphic editor to the art director of Adobe. The company would end up purchasing a license to sell Photoshop.

The first version of Adobe Photoshop was released in 1990. British artist David Hockney was invited to the presentation; after three days of experimenting with the new program, the artist said: "Photography starts in a drawing, and now it comes back."

Common Use Cases for Photoshop Today

Photoshop is used by photographers, artists, graphic and web designers, and pretty much anyone who works with computer image processing in one way or another. The most common use cases for Photoshop include:

Image correction. Photoshop makes life easier for professional photographers and hobbyists alike - even if they shoot with shaky hands on a smartphone. After all, literally every parameter of photography can be edited in the program: a cluttered horizon, perspective distortion, poor lighting, low contrast, or unwanted color tint.
Retouch. A cluttered horizon and poor lighting are not the only troubles a photographer faces. Sometimes a pimple on a model's face is the focus of a problem or a person in the background who brazenly climbed into a highly artistic shot. Retouching is the processing of images to improve them.
Collages and montages. Another use of Photoshop is combining fragments of different images into one, creating collages and photomontages.
Mockups. Photoshop helps clients and designers understand how layouts will appear on finished products. To achieve this, graphic designers present their solutions using mockups - photographs or 3D models of real objects, on which the layout of the future design is superimposed.

Controversies Surrounding Photoshop

Despite the fact that Photoshop delivered multiple innovations and created an industry around graphic design, it also led to complications that outraged certain groups in the past decade.

Many believe that Photoshop creates an unattainable image of beauty, meaning that it can dramatically change a person’s appearance: raise cheekbones, change skin clarity and eye color, lengthen legs, reduce the waist, and so on.

When people are constantly exposed to heavily edited pictures of models, their standards for what is normal begin to change. And if what they see in the mirror doesn’t meet these standards, they start feeling anxious about their appearance, style, and so on. This can lead to eating disorders such as bulimia, anorexia, and many different psychological problems.

Beyond concerns around unrealistic beauty standards, Photoshop was also labeled as an instrument of propaganda. By simply cropping a photo in a certain way, you can change the meaning of a picture in a way that exemplifies a narrative that was never intended with the original image. Photoshopping images in different ways can be used to manipulate information for political purposes.

“Like any tool, it can be used to do good things or bad things," said Thomas Knoll about his invention. Today there are not so many heated discussions about Photoshop. However, thanks to this program and its controversies, people know that not everything they see is the truth.

Voice Synthesis - Another Harmful Technology?

Photoshop and video deepfakes demonstrated that not everything we see can be trusted. Now audio fakes can make us consider the veracity of the information we hear. In some cases, audio deepfakes can be a dangerous weapon of deception.

Speech synthesis can make your voice sound like someone else's. This voice AI technology involves the artificial simulation of human speech by a computer, implemented in speech synthesis software or hardware through the use of a speech generator.

Respeecher of the super resolution algorithm to deliver the highest resolution audio across the board, even if the original recording is not of the highest quality. You can read more about it on this audio super-resolution whitepaper.

Voice cloning technology can be dangerous, it can fool people into thinking someone said something they didn't. To prevent this from happening, we at Respeecher use watermarking technology and follow a strict ethical code. You can read more about it on the Respeecher FAQ page.

Synthesized speech is generated through a process of integrating the pieces of a recorded voice that resides in a database. It is based on two kinds of technologies, text-to-speech and speech-to-speech synthesis.

Speech synthesis software is mostly used in cultural industries. These are the most common use cases of the technology:

Film and TV. Cloned voices are used for dubbing an actor’s voice in post-production, allowing for the revival of an actor’s voice who has long since passed. This process is facilitated by speech generator technology, which recreates the actor's voice based on existing recordings and samples.
Animation. Speech synthesis software allows your animations to speak the way you want them to, using AI voices that can be customized to fit the characters and narrative of your project.
Game development. Voice synthesis is regularly used to create characters in video games so that they sound exactly like the specific characters they are based on, employing generative AI to ensure authenticity and immersion for players.
Podcasts and audiobooks. The narrator’s voice can be changed to the author’s voice, allowing the audience to listen to the author reading their own words with the assistance of voice AI technology.
Advertising. Speech synthesis helps you tailor your ads to particular audiences by using region-specific pronunciation.
Dubbing and localization. The technology allows you to streamline the dubbing process, making it more agile through the use of speech generator technology.

A great example of using voice synthesis was this year’s revival of Vincent Lombardi for the Super Bowl. The American football legend came back to life on screen and spoke to the audience minutes before the singing of “America the Beautiful”. Through advanced AI voice technology, Lombardi's iconic voice was recreated with stunning realism, captivating viewers and adding a touch of nostalgia to the event.

Another recent example is the revival of Manuel Rivera Morales' voice which narrated the Olympic Women’s Basketball match between Puerto Rico and China on Tuesday, July 27.

However, there are many other use cases of the technology beyond entertainment purposes.

How to Use Speech Synthesis Software Legally

Just like Photoshop, speech synthesis can be used for malicious purposes. It can fool people into believing something someone said even though they didn’t.

The first documented use of an audio deepfake in a scam occurred in March 2019. The criminals persuaded an employee of a corporation to transfer €220 thousand into their bank account.

The CEO of the British energy company thought he was speaking with his boss, the head of the German parent company, asking him for an urgent transfer. As the victim later stated, the criminals were able to clone his boss’s slight German accent.

Today, voice cloning is capable of incorporating the variety of emotional accents and nuances of an original voice, making it almost impossible to distinguish a real voice from a fake one.

Ethical concerns have arisen around the use of cloned voices of deceased people. Moreover, some psychologists believe that the creation of audio doubles of people who have passed away may provoke mental instability.

Today however, speech synthesis software that is committed to these specific ethical principles is readily available.

One of the most important things to remember is not to use the voices of private persons without permission. Voice owners should give their written consent before their speech is cloned.
In order to easily distinguish voice synthesis content from other content, this software should include a unique audio watermark within their products.
Voice synthesis software shouldn’t utilize any public API for creating voices.
The voice cloning provider should only work with clients they trust and approve projects that meet strict standards of ethics.

Yes, synthetic media can be dangerous. While someone is using it to revolutionize movies, video games, and other creative projects, others can leverage it to fool and rob people.

That's why sticking to the ethical principles of using such technology, educating people on the boundaries of what is permitted, and creating entertaining products that will not harm people should be your highest priority.

Anna Bulakh

Head of Ethics and Partnerships

Blending a decade of expertise in international security with a passion for the ethical deployment of AI, I stand at the forefront of shaping how emerging technologies intersect with national resilience and security strategies. As the Head of Ethics and Partnerships at Respeecher, I focus on guiding ethical AI development. My role is centered around promoting the responsible use of AI, especially in synthetic media.