Are you ready for the next shift in computing? With the increased speed of computer processors and cheap RAM, voice interface is becoming more of a reality.
For example, Microsoft has voice-enabled the current version of Office XP. Another technology that's showing a lot of promise is the convergence of Web and telephony using VoiceXML.
VoiceXML is a new flavor
of XML that defines structures for playing prerecorded voice prompts as well as
text-to-speech generation for presentation to the user over the telephone. The
integrated response from the user is handled by either DTMF (touch tone) or
speech recognition.
The World Wide Web Consortium's (W3C) working draft on "Voice Browser" activity defines the standards for VoiceXML. W3C is diligently working to expand access to the Web by allowing people to interact with Web sites via spoken commands. This technology allows any telephone to access Web-based services and is especially helpful to people with disabilities. It will also improve interaction with display-based Web content in cases where the mouse and keyboard may be missing or inconvenient.
Developers using VoiceXML code
set up a <field> section so a phone application can "listen" for caller
commands. Just as text boxes on HTML pages receive the user's keyboard input,
fields in VoiceXML pages receive the caller's voice or DTMF input. Enclosed
within the <field> tag are children tags, used to control the program
flow. The following are examples of <field> children tags:
- <grammar>: Grammar specifies the collection of possible caller inputs that a field should listen for. Fields in VoiceXML cannot take a best guess at what the caller said when listening to arbitrary inputs. Fields must know ahead of time the total possible inputs to expect, although the grammar size can be very large.
- <prompt>: The prompt asks a caller for input, for example, "Say the name of a restaurant" or "Say or dial a ten-digit phone number."
- <nomatch>: This tag becomes active whenever the caller provides an input that is not found in the field's grammar.
- <noinput>: This tag becomes active whenever the caller fails to provide any input in response to a field prompt.
- <filled>: When the caller provides a recognised spoken or DTMF command, the filled section becomes active. This tag is used primarily to determine the application control in response to a caller command.
TechRepublic is the online community and information resource for all IT
professionals, from support staff to executives. We offer in-depth
technical articles written for IT professionals by IT professionals.
In addition to articles on everything from Windows to
e-mail to fire walls, we offer IT industry analysis, downloads,
management tips, discussion forums, and e-newsletters.
©
2001 TechRepublic, Inc.











