Rest in Peas: The Unrecognized Death of Speech Recognition

Speech recognition technology reached 80% accuracy in 2001, then leveled off. The human ear has about 98% accuracy. Why haven't computers improved in this area? Robert Fortner looks at several reasons.

Many spoken words sound the same. Saying “recognize speech” makes a sound that can be indistinguishable from “wreck a nice beach.” Other laughers include “wreck an eyes peach” and “recondite speech.” But with a little knowledge of word meaning and grammar, it seems like a computer ought to be able to puzzle it out. Ironically, however, much of the progress in speech recognition came from a conscious rejection of the deeper dimensions of language. As an IBM researcher famously put it: “Every time I fire a linguist my system improves.” But pink-slipping all the linguistics PhDs only gets you 80% accuracy, at best.

We can take comfort in knowing that the human brain is still way ahead of machines. Link -via Metafilter

(image source: Creative Coffins)

Newest 5

Newest 5 Comments

Speech recognition is actually very far from dead. In fact, it's thriving. Stenographers are people who handle transcripts in court, various legal purchases, provides CART services, and does closed-captioning.

However, stenographers are beginning to become a dying race, as voice-writers (that is, people who "talk" to a computer program in their computer) are on the rise, being much more efficient, faster, and more accurate. It's also an alternative for people who want to do stenography, but have carpal-tunnel or other debilitating diseases.

Abusive comment hidden. (Show it anyway.)

That's impressive. I study speech & hearing science, and I'm currently in a class all about acoustic signals (especially in terms of cochlear implants), and that's a pretty tricky thing to do. Very cool.

Abusive comment hidden. (Show it anyway.)

Mallory: It has true voice recognition. No pre-preprogramming required, and it will recognize ANY voice. And it's integrated into the OS, so it works with Google searches, map lookups, composing text/email messages... everything.

Abusive comment hidden. (Show it anyway.)

On your Droids, do you program your speech into it (for instance: Call Sandra), or does it recognize speech and then create text?

The reason I ask is because if it only has to compare what you say against a voice sample of your own, there will be much less variance to wade through. Otherwise, the speech recognition software will have a much bigger sample to have to sort out.