How Neural Networks Could Teach Computers To Talk Like Humans

WaveNet is going to help fill in computer speech's uncanny valley.

It's hard to put your finger on why, but the voices our computers use to speak just sound wrong. Even with the best voice programming, like Amazon's Alexa or Apple's Siri, computers sound—well—robotic when they talk. But that could change soon. Neural networks are now tackling the problem of making computer speech sound more natural, filling sentences with nonverbal sounds like lip smacks, breath intakes, and irregular pauses.

DeepMind, an Alphabet-owned world leader in artificial intelligence research, recently published a blog post about WaveNet, a convolutional neural network (like DeepDream) that can reduce the performance gap between computer and human speech by about 50%, researchers say. In other words, in blind opinion tests on how good WaveNet's speech sounded compared to humans, it did significantly better than other text-to-speech methods. But how?

How Computer Voices Work Now

As the DeepMind team explains in their post, most computer voices, like Siri's, are made up of huge databases of recorded speech fragments, recorded from a single speaker and then recombined by a computer to form sentences. This gives decent results, but has drawbacks. The initial databases are expensive and time-consuming to construct, and can't be modified without recording a new database from scratch. That's why, incidentally, you only hear so many computer voices out there. Apple can't just program Siri to, say, speak with a sexy James Bond accent. The company would have to record hundreds of hours of someone who spoke with that accent first. This approach also contributes to computer speech's uncanny valley problem, creating mostly accurate computer voices that still feel somehow *wrong*, and therefore repulsive, to our human ears. Unlike a human, no matter how many times a computer says a word, it will always say it with the exact same pronunciation and cadence.

Neural Networks To The Rescue

Here's where WaveNet comes in. By feeding Google's own voice database—the ones used in OK Google—into a neural network, DeepMind was able to train WaveNet to actually recreate the sounds it needs to make up a sentence, with millions of incredibly slight variations. If that confuses you, think of it this way: Whereas most computers speak by piecing together blocks of prerecorded sounds, DeepMind essentially remembers those sounds and says them out loud when it needs to use them. It's a key difference that makes WaveNet's text-to-speech samples sound more natural than even industry leaders, like Google's. Compared to the usual method, WaveNet's computers talk in a more flowing, regular cadence. Take this sample sentence: "The Blue Lagoon is a 1980 American romance film directed by Randall Kleiser." Compared to WaveNet, Google's default attempts to say this sentence make each individual syllable of this sentence sound as if there's an air gap between them. WaveNet, on the other hand, glides from phoneme to phoneme, like a human. Seriously, check it out for yourself.

Movies like Spike Jonze's Her or even 2001: A Space Odyssey present a future in which talking to a computer is as natural as talking to a human, but the truth is, despite our natural instinct to anthropomorphize our computers, most of us find it frustrating to interact with our UIs through speech. It's early days yet, but it's easy to see how approaches like WaveNet could make virtual assistants sound more natural—starting with Google's own. Given DeepMind and Google are both owned by Alphabet, don't be surprised if these improvements start rolling out to Google's text-to-speech functionality sooner rather than later.

How Neural Networks Could Teach Computers To Talk Like Humans

How Computer Voices Work Now

Neural Networks To The Rescue

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112