Figures of speech

Learning to talk


Despite its benefits, the technical aspects of speech recognition implementations are relatively painless thanks to more than decade in which the technology grew from being a laboratory fantasy to an inexpensive, practical new interface for human-machine interaction.

This growth came as researchers, dedicated to finding a way to help computers understand a broad range of words and spoken accents, steadily improved their software's ability to recognise discrete phonemes.

Recognising these sounds, which make up the basic units of all speech, lies at the core of today's speech recognition engines.

In the early days, software basically tried to compare the sound wave of spoken words with those stored in a massive database containing recordings of words.

But this approach was quite clunky, forcing speakers to pause after each word and demanding a noticeable amount of time for the system to search through its database. Even then, it wasn't particularly accurate: homonyms, tricky accents, background noise, and user unsophistication made early speech recognition an exercise in frustration.

The technology's big breakthrough came in the late 1980s with the introduction of hidden Markov models (HMM), a mathematical technique for deciding the most probable match between two similar sets of data--in this case, digitised waveforms. Thanks to HMM, speech recognition systems became far more flexible by gaining the ability to compensate for imperfections in the speaker's voice.

Over time, computers became more powerful and were able to pick increasingly small phonemic units out of a stream of speech. This paved the way for natural language speech recognition (NLSR), which finally freed users from the need to speak in halting phrases with breaks between words.

Now that NLSR has been moved from the desktop to the server side, the technology has been successfully extended into a massively scalable speech processing infrastructure. This infrastructure is typically made up of one or several standard servers clustered together into a high-availability node. As a rule of thumb, plan to have one NLSR server per 30 phone lines.

"There was a lot of marketing pressure behind speech recognition in the early and mid 1990s," says Clive Summerfield, a longtime speech researcher who founded Syrinx Speech Systems over a decade ago. "But over the past five years, the technology has started to come of age and is now really moving from high-cost, high-value niche applications into more mainstream style applications."

A one-time Australian success story, Syrinx helped kick-start the global speech recognition market with world-class technology that secured a 6000-line customer care contract with US telecommunications giant AT&T in 1998.

Last year, Syrinx installed 180 lines of speech recognition for online trader ComSec, then gained new management and a new name (Sayso!) before going into voluntary administration after a major investor pulled out this year.

"Speech recognition models have been trained on very large databases of speech to create phoneme models," he continues. "Vendors have processed those large databases and therefore the technology now is very robust-and the more they're used, the better they become.

This is a true learning machine, and the first of the artificial learning technologies to find wide-ranging commercial applications. When implemented correctly, it can return an extremely effective return on investment very quickly."

Advertisement

Talkback 0 comments

Latest Videos

Sponsored content

Power Centre - Content from our premier sponsors

Blogs

  • Chris Duckett Carelessness busts Linux security
    No operating system can ever properly protect a computer from trojans as long as users continue to do silly things. Just because Linux is immune to your standard drive-by viruses it does not mean that it can escape trojan horses.
  • Array Sun shining on Ajnaware
    Graham Dawson talks about the future of iPhone app development and augmented reality.
  • Array Holiday IT to-do lists
    The fast-approaching holiday season is a great time to update your IT systems while everything's quiet.
  • More blogs »

Tags

Back to top

Featured