Advertisement
To print: Select File and then Print from your browser's menu
-------------------------------------------------------------- This story was printed from ZDNet Australia. --------------------------------------------------------------
PDAs find their own voice

By Peter Kruger, IT Week
January 15, 2001
URL: http://www.zdnet.com.au/news/communications/soa/PDAs-find-their-own-voice/0,130061791,120108145,00.htm


Application vendors and hardware manufacturers plan to use speech input and output in mobile client devices, although the advantages are questionable for many applications, says Peter Kruger

When mobile data was just beginning, and the only client product selling in quantity was a chunky device from Nokia, Dresdner Bank began testing a simple mobile banking application. The application offered restricted SMS-based access to accounts for customers who already used an existing phone banking service.

After a number of weeks the project leader was decidedly downbeat about the experiment. It seemed that very few users were completing the data sessions and were, instead, using their handsets to speak to operators in Dresdner Bank's call centre. This was evidence, arguably, of the convenience of a voice interface for mobile commerce.

The development of increasingly sophisticated software to support voice input and output could make the use of mobile data systems easier and more attractive. There certainly are applications where the user could benefit from hands-and-eye-free operation.

However, it is uncertain how helpful voice will be for general applications. It could be that at the moment speech I/O is a feature too far, placing an excessive burden on the client and the network, and increasing the complexity of installation.

Poor user interfaces

Currently, most mobile users would rather speak to someone than scroll and press a thumb-wheel or rely on a screen the size of a postage stamp. As Adam Anger, business manager at Microsoft Europe's Mobile Solutions Unit, says, '[Mobile] users are not finding it easy to access a device via 'tap' in some situations.'

Thus many of the potential advantages of the mobile Internet are outweighed by a poor user interface. A voice call can be conducted without using hands or eyes, but PDAs tend to need both. So vendors of mobile Internet software, as well as some hardware suppliers, are hoping that voice I/O technology can help them to compete. Many mobile component suppliers and software vendors are now seeking, or developing, technologies for speech input.

Microsoft has spent years trying to build speech input and output into its desktop computing applications. In October 1999, in an effort to beef up its Speech Application Interface (SAPI), the company acquired Entropic, which is based in Cambridge. Entropic gives Microsoft access to some of the speech recognition technology developed at Cambridge University.

By April 2000 enough progress had been made for Microsoft to announce that SAPI 5.0 would support the Lernout & Hauspie ASR1600 speech recognition engine and TTS3000 text-to-speech engine. The SAPI 5.0 Software Development Kit included SAPI middleware and Lernout & Hauspie's ASR and TTS engines, as well as source code, tools and documentation. While Microsoft was distributing tools to developers of speech-enabling applications, the firm was also carrying out work to voice-enable mobile devices.

In March, Bill Gates demonstrated a device called MiPad. The MiPad ran on Windows CE and was linked to an NT server. It used a continuous speech recognition engine with a 64,000 word vocabulary. But, as Anger explains, the MiPad was not a pure voice-activated device. 'It had a tap-and-talk interface ­ providing the best of both technologies,' he says. Basically this means that a built-in microphone is activated when a field is selected. By combining tap with talk, the number of possible instructions that the device can expect to hear at any one time is narrowed down.

Microsoft has not yet announced a date for incorporating speech technology into mobile products. The uncertain future of Lernout & Hauspie, currently under threat of bankruptcy, also complicates the situation.

When they do appear, early products will have limitations. As well as probably requiring the use of a plastic stylus, Microsoft's speech engine may also be dependent on a connection to a server. This could prove unsuitable for users who want to use voice technology to initiate or reinstate a connection to the network.

Meanwhile, UK-based mobile technology specialist TTPCom is working on speech recognition that can be embedded in the client, using Smartspeak from Art. Until recently Art had specialised in handwriting recognition, but in April TTPCom took Art's speech technology and put it in the GSM chipset that it co-developed with Analog Devices. The chipset enables users to dial a number using natural speech.

TTPCom is already working with the next version of Smartspeak, which has a larger vocabulary. 'Not 64,000 words, but certainly measured in 100s,' says Richard Fry, sales director of TTPCom. The software will be stored in flash memory. 'This will enable us to get the product to the market a lot faster,' says Fry, who adds that the next version of Smartspeak could be on the market in the second quarter of this year.

Text-to-speech

Many applications will require speech output as well as input. Although unified messaging providers are already providing text-to-speech, these facilities are typically hosted on a server.

Force Computers, an embedded systems manufacturer, is using DecTalk text-to-speech technology in its StrongArm- and Intel-based wireless devices. DecTalk was first developed by DEC Computers, before being acquired by Compaq, which then sold it to Force's parent company, Solectron. Force is still developing the product, says Carl Leber, product manager for DecTalk. 'One of the things that mobile vendors are concerned with is footprint size,' he says. So a product that was once shoe-horned into a PC is now being squeezed into a mobile client.

In the same way that it is unusual for two people to have the same handwriting, it is unlikely that two users' voices will be exactly the same. The software engineer can choose to design a large product that is as comprehensive as possible and resilient in the face of users with different accents and rates of speech. The alternative is a compact product that restricts the number of words it expects at any one time.

Tap-and-talk
The approach used by MiPad ­ provides context-sensitive speech recognition. However, even this does not reduce the software's footprint to a size that easily fits on a mobile client. With current technology, the most effective way to squeeze voice recognition into a mobile device is to trim the application to the point where it is only looking for specific, and very distinct, words.

DecTalk software originally consisted of 160,000 lines of C code, but Force says that the product is now small enough for use in mobile devices. 'We are licensing the software to people who want to put the module onto chips,' says Leber.

One advantage that text-to-speech has over speech-to-text is that text has less uncertainty, so it is easier to interpret. But speech output depends on a phonetic rule engine, which can sound lumpy and mechanical. On a PC or server this problem can be overcome, to some extent, by providing a larger dictionary and more sounds and words. It is also possible to add extra digital signal processing (DSP) hardware. Neither of these options is easy or, in many cases, possible when implementing speech on a mobile client.

The choice of whether to put speech software onto the client device or leave it on the server will depend on the particular application. Julia Ferguia, director of communications and product planning at AVT, a company that provides unified messaging systems, says that she sees no reason to cram all the software into the client. '[Processing] happens on our messaging server and over our voice pipe,' she says. On the other hand, this approach would not be suitable for some applications. For example, vendors such as TTPCom point to laws that require hands-free operation when mobile devices are used by drivers.

Copyright © 2009 CBS Interactive, a CBS Company. All Rights Reserved.
ZDNET is a registered service mark of CBS Interactive. ZDNET Logo is a service mark of CBS Interactive.