China’s AI / Neural Network Answer to Google Can Clone Your Voice Within Just Seconds of Hearing It

This… does not sound good.

PATRICK CAUGHILL, FUTURISM

3 MAR 2018

Not only can the software mimic an input voice, but it can also change it to reflect another gender or even a different accent.

You can listen to some of the generated examples here, hosted on GitHub.

Adobe has a program called VoCo which could mimic a voice with only 20 minutes of audio. One Canadian startup, called Lyrebird, can clone a voice with only one minute of audio.

Baidu’s innovation has further cut that time into mere seconds.

For example: imagine your child being read to in your voice when you’re far away, or having a duplicate voice created for a person who has lost the ability to talk.

This tech could also be used to create personalized digital assistants and more natural-sounding speech translation services.

However, as with many technologies, voice cloning also comes with the risk of being abused.

New Scientist reports that the program was able to produce one voice that fooled voice recognition software with greater than 95 percent accuracy in tests.

Humans even rated the cloned voice a score of 3.16 out of 4. This could open up the possibility of AI-assisted fraud.

Programs exist that can use AI to replace or alter – and even generate from scratch – the faces of individuals in videos. Right now, this is mostly being used on the internet to bring laughs by inserting Nicolas Cage into the Lord of the Rings series.

But coupled with tech that can clone voices, we soon could be bombarded with more “fake news” of politicians doing uncharacteristic actions or saying things they wouldn’t.

It’s already very easy to fool swathes of people using just the written word or Photoshop; there could be even more trouble if these technologies were placed into the wrong hands.

from: https://www.sciencealert.com/china-search-engine-baidu-ai-development-clones-voices

China’s Google Equivalent Can Clone Voices After Seconds of Listening

In Brief

Baidu’s AI research team has developed a nueral network that can mimic a voice with less than a minute long sample. The software can also change the voice into other genders and accents.

AI Mimicry

The Google of China, Baidu, has just released a white paper showing its latest development in artificial intelligence (AI): a program that can clone voices after analyzing even a seconds-long clip, using a neural network. Not only can the software mimic an input voice, but it can also change it to reflect another gender or even a different accent.

You can listen to some of the generated examples here, hosted on GitHub.

Previous iterations of this technology have allowed voice cloning after systems analyzed longer voice samples. In 2017, the Baidu Deep Voice research team introduced technology that could clone voices with 30 minutes of training material. Adobe has a program called VoCo which could mimic a voice with only 20 minutes of audio. One Canadian startup, called Lyrebird, can clone a voice with only one minute of audio. Baidu’s innovation has further cut that time into mere seconds.

While at first this may seem like an upgrade to tech that became popular in the 90s, with the help of “Home Alone 2” and the “Scream” franchise, there are actually some noble applications for this technology. For example: imagine your child being read to in your voice when you’re far away, or having a duplicate voice created for a person who has lost the ability to talk. This tech could also be used to create personalized digital assistants and more natural-sounding speech translation services.

However, as with many technologies, voice cloning also comes with the risk of being abused. New Scientist reports that the program was able to produce one voice that fooled voice recognition software with greater than 95 percent accuracy in tests. Humans even rated the cloned voice a score of 3.16 out of 4. This could open up the possibility of AI-assisted fraud.

Fake Obama created using AI video tool – BBC News
Researchers at the University of Washington have produced a photorealistic former US President Barack Obama. Artificial intelligence was used to precisely model how Mr Obama moves his mouth when he speaks. Their technique allows them to put any words into their synthetic Barack Obama’s mouth.

Programs exist that can use AI to replace or alter — and even generate from scratch — the faces of individuals in videos. Right now, this is mostly being used on the internet to bring laughs by inserting Nicolas Cage into the “Lord of the Rings” series. But coupled with tech that can clone voices, we soon could be bombarded with more “fake news” of politicians doing uncharacteristic actions or saying things they wouldn’t.

It’s already very easy to fool swathes of people using just the written word or Photoshop; there could be even more trouble if these technologies were placed into the wrong hands.

from: https://futurism.com/baidu-clone-voices-seconds/

Neural Voice Cloning with a Few Samples

View the PDF: https://www.bgp4.com/wp-content/uploads/2018/03/Neural-Voice-Cloning-with-a-Few-Samples.pdf

A research team has created software that allows them to control the face of anyone in any video on YouTube. The result is a weird cross between Snapchat’s “face swap” feature and the “gibberish” scene from Bruce Almighty.

We fed 270,000 words spoken by Trump into a computer program that studies language patterns. This system analyzed his word choice and grammar, learning how to simulate Trump’s speech.
Here is the speech written entirely by artificial intelligence.
When prompting the neural network for written output, the system allows the user to select a “temperature”, which tells the program how creative or daring to be with its word choice. At low temperatures, the neural network always chooses the most-likely next character as it’s generating a sequence, while at high temperatures it will choose a character that’s farther down the probability list.

Donald Trump played by John Di Domenico
Recurrent Neural Network run by Janelle Shane

…