Google wants to talk

Google's Mike Cohen won't be satisfied until anyone who wants to talk to their computer can do so without laughing at the hideous translation or sighing in frustration.

Cohen, a leading figure in speech technology circles, heads up Google's efforts to advance the science of speech technology while applying it to as many products as possible. "Google's mission is to organise the world's information, and it turns out a lot of the world's information is spoken," Cohen said, in a recent interview with ZDNet Australia's sister site CNET about the search giant's speech ambitions.

Google is attempting to produce voice-recognition technology that fits in with its view that the computing universe is shifting toward mobile devices and browser-based applications. That is, easy-to-use software that does the heavy lifting at the datacentre in order to run over the internet on a mobile device with limited hardware.

Computer speech recognition seems like it has been five to 10 years away for decades. Indeed, the electronics and computer industries have been chasing the goal of voice-directed computers for nearly 100 years, when a simple wooden toy dog released in 1911 called Radio Rex first captivated children and adults by responding (at least some of the time) when his owners called for "Rex!" by shooting out of a doghouse. (Cohen owns one of the few remaining gadgets.)

Huge advances have obviously been made since the 1920s, yet few of us use our computers like HAL in 2001: A Space Odyssey or KITT, the computerised car in Knight Rider. Cohen, however, believes the industry is about to silence the jokes about amusingly garbled voice mails as speech recognition models grow more sophisticated, engineers pack mobile computing devices with more sophisticated hardware, and users start to realise that performance has made great strides.

"The goal is complete ubiquity of spoken input and output," Cohen said. "Wherever it makes sense, we want it to be available with very high performance."

They can hear you now

Cohen, who founded speech technology company Nuance Communications before coming to Google in 2004, has been working in this field for 26 years. At Google, his job has been to apply cutting-edge speech recognition and synthesis technology to Google services, starting with GOOG-411 in 2007 and voice search in 2008.

At this point, most leading speech-technology systems have settled on a basic architecture, Cohen said. The first step involves analysing incoming sound waves in 10-millisecond batches, identifying subtleties in pitch and range to create a digital representation of those sounds. Then comes the hard part, taking those batches and attempting to match them against the billions of combinations of sounds that make up words in the English language. (The process is the same for other languages, but the number of sound combinations differs from language to language.)

"It's fundamentally a big statistical model," Cohen said. Google's method and other speech-recognition models analyse the sounds for their acoustic quality to identify "phonemes", (a basic sound unit of a word, such as "ooo" in "Google"), how those phonemes form individual words, and how grammar informs the construction of those words into sentences.

In terms of its basic approach, Google's not doing anything different than others who implement speech technologies. Nuance's Dragon Naturally Speaking enjoys quite a following among those interested in this area. Microsoft and Apple have spent tons of time and money researching voice-recognition technology in their desktop operating systems for years. Start-ups like Vlingo are putting such technology on mobile computers.

Naturally, however, Cohen thinks Google has a few advantages.

Time and data

Speech recognition is an extremely compute-intensive problem, with a lot of resources required to decode even simple voice commands or requests in seconds. Fortunately for him, Cohen happens to work for a company with one of the world's largest reservoirs of computing resources.

And as everyone knows, Google has accumulated a vast amount of data on human speech patterns, both from the queries people type into its search engine every day as well as the more than 10 million books it has digitised as part of its Google Books Search project.

The combination allows Google to manipulate very large data sets when it is processing speech recognition queries, and that is "one of the reasons that we've some big advances", Cohen said. He thinks Google can deliver more accurate results in a quicker amount of time because of this ability to crunch huge amounts of new data and verify it against older data.

Google's most visible work has shown up in its Android mobile operating system, where Android users can click on a little microphone button on the home search page to use their voices to search the web or launch certain applications. At an event earlier this month, Google mobile product managers said Android users are placing about one out of every four search queries using the microphone.

But Google has also released technology that gives YouTube users a way to automatically caption their videos. Its Google Voice application transcribes voice mails left on Google Voice accounts into text, occasionally with hilarious results. And Google told The Times in the UK that it's working on a "translator phone" that would let users speak a sentence into the phone and have a translated version repeated over a speaker.

Sound barriers

Few would argue, however, that Google or anyone in the industry has achieved truly reliable speech-recognition technology. What's holding the company back?

The most basic issue at the moment is simple background noise, Cohen said. Mobile users on the go face interference from wind, background conversations or traffic noise that can distort the sounds captured in that first part of the recognition systems. Better microphones could help, but the systems have to get better about dealing with such interference, Cohen said.

Another major problem is the complexity of anticipating what people might say and accurately synthesising that into text. This isn't just about accents or dialects (Cohen, in a wry Brooklyn accent, recalled a speech technology professor who warned him that no one speaks correctly), but just that nicknames, slang and rushed or incomplete sentences can confuse the smartest algorithms.

Google has noticed that people use voice search the way they search on Google, speaking in keywords and phrases like "restaurants in Palo Alto". That makes it easier to predict what a collection of sounds means in a search context, since it can cross-reference the speech it has synthesised against a database of search queries. Voice mails, on the other hand, are almost completely unpredictable, especially because Google does not maintain a similar database of voice mails due to privacy concerns, he said.

So while plenty of challenges remain, there's a sense both inside and outside Google that speech technology is on the cusp of becoming something people expect rather than a feature that a few devotees covet. It may take some getting used to, but we're already seeing people abandon computer input methods designed for another era — the keyboard and mouse — in favour of touchscreens and voice commands.

It's not about "killing" an older input method, it's about providing alternatives. "You just want people to assume that if they feel like talking, they can, and if they feel like typing, they can," Cohen said.

Via CNET

Talkback

Add your opinion

In order to post a comment, you need to be registered. (Sign In or register below)

Post your comment

Terms of Service - As a ZDNet registrant, and by using this service, you indicate that you agree to our Terms and Conditions and have read and understand our Privacy Policy.

ZDNet Australia Live

Please enlighten us all, what is the mark?

2 minutes ago by omega on Satellite-hating Libs blow policy free kick

IMAX replaces world's largest screen: pics: Go behind the scenes with our photo tour, and find out why the CEO o... http://t.co/eKH1lHfH

Take an early tour of Windows 8's Office 15: I see the significance of the NBN as being equal to building railwa... http://t.co/yw32J0ah

Twitter now available in emergencies with satellite providers http://t.co/yHD7oY0q

You're spot on with your comment re: hollywood. I'd bet my dogs they had HD 5.1 multi-angle video footage of the whole thing (not yet re...

14 minutes ago by Powerpup on From copyright to a world without borders

Have a look at powershop.co.nz - we definitely save money overall, and have the abillity to purchase discounted power in advance and see ...

20 minutes ago by Powerpup on NZ energy prices fall, websites thanked

"The number of people that believe they understand security, but don't, far exceed the number of people that do," http://t.co/rYMdWA0P

Who knew they had online shopping? These guys dont have a clue. Just bought a toaster for $67 from Appliances Online. Same one with ...

38 minutes ago by xBeanie on IBM to fix David Jones' online sales

I see the significance of the NBN as being equal to building railways, ports and surfacing the roads. The efficiencies in the economy of ...

39 minutes ago by H.Digitalis on Satellite-hating Libs blow policy free kick

David, your article is so poorly written and one sided that either you're incompetent or your post is a troll to gain plenty of hits for ...

47 minutes ago by tjb on Satellite-hating Libs blow policy free kick

The future of browsing...[video] http://t.co/HBbD8vo1

More change at the top for RIM http://t.co/xJEYc6WZ

As usual, the libs miss the point and show their ignorance. Wonder how their rusted-on RARA constituency will react? http://t.co/jep0yDrA

RT @dmbieg: The end of an era as Kodak discontinues camera business http://t.co/dl7yyd7t

Why a $25 computer means revolution: ... In the last 60 years, the computer has evolved from a machine that fill... http://t.co/qrAGAXbb

And let's not forget that the sky is probably gunna fall in so there's another pile of cash down the drain. And the NBN modem sitting o...

1 hour ago by omega on Satellite-hating Libs blow policy free kick

Glad you asked that redrover, I was going to ask that myself.

1 hour ago by clive49 on Satellite-hating Libs blow policy free kick

David, the distraction is Turnbull's incompetence. Turnbull now believes he can predict 30 years into the future because he believes that...

1 hour ago by omega on Satellite-hating Libs blow policy free kick

Yeah, and let's not forget the $4000 in services costs per site to get it in. Thats $10K. The we have 20% great big new carbon tax, that...

2 hours ago by Ocker on Satellite-hating Libs blow policy free kick

Are your children, grandchildren and great grandchildren 'fetching emails' are they? For every 1 dollar spent on the NBN, Australian tax...

2 hours ago by omega on Satellite-hating Libs blow policy free kick

RT @zdnetaustralia: Watch as the world's largest screen, IMAX, be replaced http://t.co/b0G2rPle

Guys if a product can be sold cheaper after the 10%GST then this over time will change our current approach to running a business. Additi...

2 hours ago by value spotters on Shopping online: so much more than GST

What I've been wondering is if NBN Co can lease the excess capacity on the satellites to other players in the SE Asia/Pacific region and ...

2 hours ago by redrover on Satellite-hating Libs blow policy free kick

I think David hit the nail on the head pretty much. Even as a Liberal supporter, I'll vote Labor just to get the continued investment int...

2 hours ago by GrahamK on Satellite-hating Libs blow policy free kick

The MaxJu5t1c3 Daily is out! http://t.co/uONV9w5S ▸ Top stories today via @zdnetaustralia

Thats why i had to watch FTA TV last night, grrrr RT @zdnetaustralia: Optus fibre cable cut in ACT: http://t.co/zDu6vTE4

RT @timbo2002: IBM to bring David Jones into the 90s r.e. it's online & ecommerce capabilities: http://t.co/lHv2ZInA

Take an early tour of Windows 8's Office 15 http://t.co/Jr1WAXhG via @zdnetaustralia

I live in suburban Melbourne, not within 3km of an exchange. On a good day my Internet is about 3.8Mbps. It used to be closer to 5 but as...

2 hours ago by GregE on Satellite-hating Libs blow policy free kick

RT @Asher_Wolf: UK to announce website blocking proposals “imminently”
http://t.co/WlByuQtG #censorship

...satellite isn't a solution for everyone. VOiP telecom is big here in North America, I use Vonage as my primary phone, and the huge ban...

3 hours ago by MortimerSnerd on Satellite-hating Libs blow policy free kick

RT @timbo2002: IBM to bring David Jones into the 90s r.e. it's online & ecommerce capabilities: http://t.co/lHv2ZInA

AFL fights Optus for its copyright - ZDNet Australia http://t.co/TK4ml3Jg

Satellite-hating Libs blow policy free kick http://t.co/PF5S8dgP

RT @Asher_Wolf: UK to announce website blocking proposals “imminently”
http://t.co/WlByuQtG #censorship

They're the cialis across them reflected been, but though said. The cheap cialis. Kamagra toward uk requiring because cliffs not cheap yo...

3 hours ago by solleyinceshy on Broadband Speedtest

RT @Asher_Wolf: UK to announce website blocking proposals “imminently”
http://t.co/WlByuQtG #censorship

RT @NewtonMark: UK #sopa. RT @Asher_Wolf: UK to announce website blocking proposals “imminently”
http://t.co/LdVdlLkh #censorship

We have satellite internet at the moment as we are in a black spot in western NSW. We find it OK.How fast do we need down load to fetch m...

3 hours ago by jmill on Satellite-hating Libs blow policy free kick

UK to announce website blocking proposals “imminently”
http://t.co/WlByuQtG #censorship

RT @Asher_Wolf: UK to announce website blocking proposals “imminently”
http://t.co/WlByuQtG #censorship

RT @Asher_Wolf: UK to announce website blocking proposals “imminently”
http://t.co/WlByuQtG #censorship

UK #sopa. RT @Asher_Wolf: UK to announce website blocking proposals “imminently”
http://t.co/LdVdlLkh #censorship

RT @NewtonMark: UK #sopa. RT @Asher_Wolf: UK to announce website blocking proposals “imminently”
http://t.co/LdVdlLkh #censorship

[plug] Satellite-hating Libs blow #NBN free kick http://t.co/PwDfr7BR. @TurnbullMalcolm policy benefits if birds deliver 12Mbps to 2m homes

IT jobs update | One podcast with the lot - ZDNet Australia - One podcast with the lotZDNet AustraliaWe ask if the I... http://t.co/01f2SzCV

@engochick ahh ok. Keep up the good work. I really enjoy the articles on @zdnetaustralia

Your extrapolation to $1000 per property does not take into account that the end user equipment costs about $5000 per site so you should ...

3 hours ago by Dave of Nakara on Satellite-hating Libs blow policy free kick

RT @zdnetaustralia: Telstra will move 4.2 million BigPond customers onto Microsoft's Windows Live email service: http://t.co/kcGsdC0m

FBI releases Steve Jobs' background check: What's inside http://t.co/eYGD57Ba

I don't think 'plugging' australia's urban black spots is the answer. iIf that was the intention, why not just re-lay copper cable in tho...

3 hours ago by gleff on Satellite-hating Libs blow policy free kick

IBM to fix David Jones' online sales: David Jones has turned to IBM to help it build a better web presence to re... http://t.co/ZLAKW6Ez

RT @zdnetaustralia: Optus fibre cable cut in ACT: http://t.co/WIYuHb9Q

IBM to fix David Jones' online sales - David Jones has turned to IBM to help it build a better web presence to reviv... http://t.co/9tYSdXwg

I was stunned to hear Adobe even had anyone on staff who cared about security ... then I read the article. http://t.co/TD63eWpZ

This story has been voted 20 times in the last 24 hours!

3 days ago, Symantec confirms hacker extortion

This story has been voted 10 times in the last 24 hours!

3 days ago, Symantec confirms hacker extortion

Facebook Activity

Keep up with ZDNet Australia

ZDNet Events Calendar

ZDNet Events Calendar