Google wants to talk

Related gallery

ZDNet App Wrap: 14 May 2012

ZDNet App Wrap: 14 May 2012

Google's Mike Cohen won't be satisfied until anyone who wants to talk to their computer can do so without laughing at the hideous translation or sighing in frustration.

Cohen, a leading figure in speech technology circles, heads up Google's efforts to advance the science of speech technology while applying it to as many products as possible. "Google's mission is to organise the world's information, and it turns out a lot of the world's information is spoken," Cohen said, in a recent interview with ZDNet Australia's sister site CNET about the search giant's speech ambitions.

Google is attempting to produce voice-recognition technology that fits in with its view that the computing universe is shifting toward mobile devices and browser-based applications. That is, easy-to-use software that does the heavy lifting at the datacentre in order to run over the internet on a mobile device with limited hardware.

Computer speech recognition seems like it has been five to 10 years away for decades. Indeed, the electronics and computer industries have been chasing the goal of voice-directed computers for nearly 100 years, when a simple wooden toy dog released in 1911 called Radio Rex first captivated children and adults by responding (at least some of the time) when his owners called for "Rex!" by shooting out of a doghouse. (Cohen owns one of the few remaining gadgets.)

Huge advances have obviously been made since the 1920s, yet few of us use our computers like HAL in 2001: A Space Odyssey or KITT, the computerised car in Knight Rider. Cohen, however, believes the industry is about to silence the jokes about amusingly garbled voice mails as speech recognition models grow more sophisticated, engineers pack mobile computing devices with more sophisticated hardware, and users start to realise that performance has made great strides.

"The goal is complete ubiquity of spoken input and output," Cohen said. "Wherever it makes sense, we want it to be available with very high performance."

They can hear you now

Cohen, who founded speech technology company Nuance Communications before coming to Google in 2004, has been working in this field for 26 years. At Google, his job has been to apply cutting-edge speech recognition and synthesis technology to Google services, starting with GOOG-411 in 2007 and voice search in 2008.

At this point, most leading speech-technology systems have settled on a basic architecture, Cohen said. The first step involves analysing incoming sound waves in 10-millisecond batches, identifying subtleties in pitch and range to create a digital representation of those sounds. Then comes the hard part, taking those batches and attempting to match them against the billions of combinations of sounds that make up words in the English language. (The process is the same for other languages, but the number of sound combinations differs from language to language.)

"It's fundamentally a big statistical model," Cohen said. Google's method and other speech-recognition models analyse the sounds for their acoustic quality to identify "phonemes", (a basic sound unit of a word, such as "ooo" in "Google"), how those phonemes form individual words, and how grammar informs the construction of those words into sentences.

In terms of its basic approach, Google's not doing anything different than others who implement speech technologies. Nuance's Dragon Naturally Speaking enjoys quite a following among those interested in this area. Microsoft and Apple have spent tons of time and money researching voice-recognition technology in their desktop operating systems for years. Start-ups like Vlingo are putting such technology on mobile computers.

Naturally, however, Cohen thinks Google has a few advantages.

Time and data

Speech recognition is an extremely compute-intensive problem, with a lot of resources required to decode even simple voice commands or requests in seconds. Fortunately for him, Cohen happens to work for a company with one of the world's largest reservoirs of computing resources.

And as everyone knows, Google has accumulated a vast amount of data on human speech patterns, both from the queries people type into its search engine every day as well as the more than 10 million books it has digitised as part of its Google Books Search project.

The combination allows Google to manipulate very large data sets when it is processing speech recognition queries, and that is "one of the reasons that we've some big advances", Cohen said. He thinks Google can deliver more accurate results in a quicker amount of time because of this ability to crunch huge amounts of new data and verify it against older data.

Google's most visible work has shown up in its Android mobile operating system, where Android users can click on a little microphone button on the home search page to use their voices to search the web or launch certain applications. At an event earlier this month, Google mobile product managers said Android users are placing about one out of every four search queries using the microphone.

But Google has also released technology that gives YouTube users a way to automatically caption their videos. Its Google Voice application transcribes voice mails left on Google Voice accounts into text, occasionally with hilarious results. And Google told The Times in the UK that it's working on a "translator phone" that would let users speak a sentence into the phone and have a translated version repeated over a speaker.

Sound barriers

Few would argue, however, that Google or anyone in the industry has achieved truly reliable speech-recognition technology. What's holding the company back?

The most basic issue at the moment is simple background noise, Cohen said. Mobile users on the go face interference from wind, background conversations or traffic noise that can distort the sounds captured in that first part of the recognition systems. Better microphones could help, but the systems have to get better about dealing with such interference, Cohen said.

Another major problem is the complexity of anticipating what people might say and accurately synthesising that into text. This isn't just about accents or dialects (Cohen, in a wry Brooklyn accent, recalled a speech technology professor who warned him that no one speaks correctly), but just that nicknames, slang and rushed or incomplete sentences can confuse the smartest algorithms.

Google has noticed that people use voice search the way they search on Google, speaking in keywords and phrases like "restaurants in Palo Alto". That makes it easier to predict what a collection of sounds means in a search context, since it can cross-reference the speech it has synthesised against a database of search queries. Voice mails, on the other hand, are almost completely unpredictable, especially because Google does not maintain a similar database of voice mails due to privacy concerns, he said.

So while plenty of challenges remain, there's a sense both inside and outside Google that speech technology is on the cusp of becoming something people expect rather than a feature that a few devotees covet. It may take some getting used to, but we're already seeing people abandon computer input methods designed for another era — the keyboard and mouse — in favour of touchscreens and voice commands.

It's not about "killing" an older input method, it's about providing alternatives. "You just want people to assume that if they feel like talking, they can, and if they feel like typing, they can," Cohen said.

Via CNET

Talkback

Add your opinion

In order to post a comment, you need to be registered. (Sign In or register below)

Post your comment

Terms of Service - As a ZDNet registrant, and by using this service, you indicate that you agree to our Terms and Conditions and have read and understand our Privacy Policy.

ZDNet Australia Live

If Vista is cheesy, Metro is an over-ripe Stilton.

1 minute ago by meski on Microsoft admits Vista was 'cheesy'

A farewell to democracy: Kaspersky - ZDNet Australia - A farewell to democracy: KasperskyZDNet AustraliaWithout inte... http://t.co/4Chwa6uL

A farewell to democracy: Kaspersky http://t.co/mOhiBgDu

Spotify launch suffers redirect bungle http://t.co/EZeHfNeb

RT @zdnetaustralia: What are Android's biggest security flaws? http://t.co/SJoTiDUY ^ST

Chief Marketing Officer - the hottest seat in the C-suite http://t.co/Gfnvwm7c

you are kidding right - what qualification do you have to make such wildy stupid statements - do you really have customers who pay you fo...

27 minutes ago by rant rant rant on National Botnet Network coming: Earthwave

Spotify launch suffers redirect bungle - ZDNet Australia http://t.co/VmBsbPL8

Spotify launch suffers redirect bungle - ZDNet Australia http://t.co/E1kTrltd

Spotify launch suffers redirect bungle http://t.co/8UP4lyd1

by http://t.co/vmlQ0Ecb: Spotify launch suffers redirect bungle: Spotify's Australian launch seems to have failed... http://t.co/FRd6qAFw

Spotify launch suffers redirect bungle http://t.co/KPzJd2I8

Chrome overtakes IE: does it matter?: Google's Chrome appears to have become the most-used browser, having surpa... http://t.co/RJH13wPw

#Qantas promotes Strategy & Technology Head to #Jetstar CEO role from July 2012 http://t.co/bn5lmRRe

Monday madness Anonymous hacks Bureau of Justice http://t.co/GZ2jD9iO

A farewell to democracy: Kaspersky - ZDNet Australia http://t.co/I4NUagc8

A farewell to democracy: Kaspersky - ZDNet Australia http://t.co/50zNZ6O3

Spotify launch suffers redirect bungle: Spotify's Australian launch seems to have failed on at least one level: ... http://t.co/9btrXux2

Spotify launch suffers redirect bungle: Spotify's Australian launch seems to have failed on at least one level: ... http://t.co/9BvAawhj

A farewell to democracy: Kaspersky - ZDNet Australia http://t.co/qXfkgh8l #australia #technews

Spotify launch suffers redirect bungle: Spotify's Australian launch seems to have failed on at least one level: ... http://t.co/9BvEI6id

A little QA goes a long way. Spotify's redirection bungle http://t.co/NL5gCATG ^ST

Kaspersky says that democracy is threatened if we don't get a handle on e-voting http://t.co/w4Wgrqod ^ST

RT @lukehopewell: Eugene Kaspersky: without online passports, democracy will fall apart within 20 years http://t.co/nkNPUcph [COOL!]

BigAir acquires Qld wireless carrier - Communications - News - ZDNet Australia | @scoopit http://t.co/mha59x9x

Kaspersky's farewell to democracy: without online passports, democracy will fall apart within 20 years - http://t.co/w4Wgrqod ^LH

Android's biggest #security flaws: Android is widely accepted as being iOS' greatest rival, but, according to De... http://t.co/nVdKxBCD

BigAir acquires Qld wireless carrier http://t.co/ARFQmWqa

IBM bolsters big-data line-up with Vivisimo http://t.co/K2z8KrtP @zdnetaustralia

IBM bolsters big-data line-up with Vivisimo http://t.co/B6IOVeDv @zdnetaustralia

EU antitrust chief: We'll settle with Google http://t.co/9E7EEuAi

Chrome overtakes IE: does it matter? http://t.co/cTBwlULz

BigAir acquires Qld wireless carrier http://t.co/27vGpBMN

BigAir acquires Qld wireless carrier http://t.co/tUmhIliq

BigAir buys Qld wireless carrier Allegro Networks http://t.co/6DS1iadL ^ST

Five pros and cons of the NBN http://t.co/5M3lLbYX via @zdnetaustralia #nbn

Coffee may make you live longer: http://t.co/LkbAxgRu

Exactly. There are two topics of discussion, that are co-mingled; 1) Unauthorized software was put on the company device, by an IT person...

3 hours ago by lamont on ABC's Bitcoin miner tackled in minutes

Of course, it's true and it may be quite unnerving and mind-boggling, to begin thinking about selling or buying precious jewelry. This, o...

8 hours ago by Sanchezgavi5 on Don't add Telstra deal to NBN cost: Quigley

First off, Bitcoin is not a virus. Second off, the only way to generate Bitcoins, is by using a Bitcoin miner. More information on this h...

13 hours ago by rizowski on ABC's Bitcoin miner tackled in minutes

When an operating system is sold it should not launch until an approved security service is purchased online with a list of approved supp...

14 hours ago by Kevin Cobley on National Botnet Network coming: Earthwave

Admits? Don't fall for their marketing. Vista was beautiful. Microsoft has a history of trashing their older OSes.

19 hours ago by anonymuos on Microsoft admits Vista was 'cheesy'

Gotta agree. For our Burnie, Tas. internet, we have a 1.5MB download speed adls connection through exetel using testra copper line. ADS...

20 hours ago by brozza on Broadband Speedtest

Well the message certainly is clear. Never do anything because something might happen. Seriously it seems to me "Earthwave" just want to...

21 hours ago by Hubert Cumberdale on National Botnet Network coming: Earthwave

you really think it's going to be such a grim future? looking at South Korea, Japan, even Czech Republic - I haven't seen either emit mo...

23 hours ago by romant on National Botnet Network coming: Earthwave

No... they'll just blame the NBN for that too ;-)

1 day ago by Beta on National Botnet Network coming: Earthwave

It seems that some of the people who set up ACCAN (not staff members) took the view that it would somehow be against their view of 'consu...

1 day ago by socrates on ACCAN gets govt tick amid industry criticism

Don't laugh, Mr Turnbull is dumb enough to try and use this against the NBN. I'm sure the noallitions magical FTTN will be impervious to ...

1 day ago by Jingles on National Botnet Network coming: Earthwave

OMG, the sky will fall if we get NBN - it must be cancelled immediately! Sorry; was just channelling Malcolm Turnbull there for a moment...

1 day ago by socrates on National Botnet Network coming: Earthwave

Thats just stupid.. what else is the NBN going to get blamed for? People die crossing the road, are you going to ban cars or police it b...

1 day ago by fibretech on National Botnet Network coming: Earthwave

And again - missed this bit did you? "... Telstra is responsible for estates where development approval was granted before 1 January 201...

1 day ago by Beta on Copper greenfield dominance irrelevant: Conroy

I think the idea of dropping aero glass bit of a mistake. At least have some colour. Thats something i liked (especially after working on...

1 day ago by JCOZ on Microsoft admits Vista was 'cheesy'

Yes, most people hate the processes put in place to ensure purchasing is fair, transparent and above board. Having been a purchasing off...

1 day ago by ozguy2000 on Woolies case poses procurement questions

God,..why spend another $6.7M on a system that's never going to be any good & never work in all probability!.. \ Government bureaucrats ...

1 day ago by Keith Styles on Vic scraps HealthSMART system

The gorilla in the room is Information Privacy Principles. I'm not so sure that providing arbitrarily developed acceptable usage policie...

1 day ago by Rowan Williams on How government does BYOD

Facebook Activity

Keep up with ZDNet Australia

ZDNet Events Calendar

ZDNet Events Calendar