In preparation for this new wave of voice-based Web access, Dr Rolf Schwitter, a lecturer at Sydney's Macquarie University, has integrated training in voice XML into an introductory course on Web technology.
"We cover basic speech technologies, speech synthesis, text to speech, then expand into an introduction to voice XML," Schwitt explains.
Schwitter believes, as speech technology becomes more prevalent, developers will need to understand both the engineering requirements of implementing a solution, as well as some of the psychological and linguistic requirements needed to write a dialogue flow.
"You have to ask the right questions to get the right information," Schwitter says. "If they are interested after having completed the first phase of the course, we have a unit called Interactive natural language systems, devoted to the subject."
Accordingly the course material he has been involved with was developed in conjunction with industry partners such as Motorola and Phillips.
"We are formulating the course based on what they want from their future employees, and teaching students what they will have to know," Schwitter says.
Traversing the technological plateau
While vendors are characterising the next phase of development in speech recognition technology as applications and integrations focussed, researchers in the field recognise that the technology behind such applications has largely reached a plateau.
Dr Steve Cassidy, senior lecturer in computing at Macquarie says there is a trend in the academic literature discussing what the next quantum leap in the technology might be.
"As far as the vendors are concerned the technology is at the state where you can do lots of useful things with it, and while the researchers are always trying to push things as far as they can, most of the work being done is based on incremental changes," Cassidy says.
Dr David Grayden, research fellow at the Bionic Ear Institute in Melbourne, believes that along side advances in processing power there have been three main advances in speech recognition technology.
"The first breakthrough was the introduction of databased approaches - rather than trying to understand every little speech event, then came dynamic time warping, which enabled the software to compare incoming speech with stored versions of the speech," says Grayden. "Next came hidden Markov models, which allowed continuous speech to be recognised, and forms the basis of dictation type models."
While conceding he is probably in the minority among engineering-focussed researchers Grayden argues that an earlier move away from integrating linguistic physiology into speech recognition research has ultimately proven detrimental.
"In there early days there was a notion that every time the research became more engineering based there was a leap forward in the technology," Grayden says. "I believe that is why it plateaued. There is now a need for a breakthrough, something new that will give us a jump in performance, and I believe it will come from a mixture of skills, including computer engineering, linguistics and physiology."
In the mean time, Cassidy is focussing on training graduates for an employment market where voice applications development is likely to provide the bulk of the work opportunities.
"Customer acceptance is growing, and it is fairly inevitable that these kinds of voice systems will take off and that the possibility for a bigger voice industry is already there, even given the limitations of the current technologies," Cassidy surmises.













