Can VoiceXML speak to Web developers?

Editor's note: This article originally appeared in TechRepublic's Web Development Zone TechMail. Subscribe, and you'll receive information on Web-development related projects and trends.

Are you ready for the next shift in computing? With the increased speed of computer processors and cheap RAM, voice interface is becoming more of a reality.

For example, Microsoft has voice-enabled the current version of Office XP. Another technology that's showing a lot of promise is the convergence of Web and telephony using VoiceXML.

VoiceXML is a new flavor of XML that defines structures for playing prerecorded voice prompts as well as text-to-speech generation for presentation to the user over the telephone. The integrated response from the user is handled by either DTMF (touch tone) or speech recognition.

The World Wide Web Consortium's (W3C) working draft on "Voice Browser" activity defines the standards for VoiceXML. W3C is diligently working to expand access to the Web by allowing people to interact with Web sites via spoken commands. This technology allows any telephone to access Web-based services and is especially helpful to people with disabilities. It will also improve interaction with display-based Web content in cases where the mouse and keyboard may be missing or inconvenient.

Developers using VoiceXML code set up a <field> section so a phone application can "listen" for caller commands. Just as text boxes on HTML pages receive the user's keyboard input, fields in VoiceXML pages receive the caller's voice or DTMF input. Enclosed within the <field> tag are children tags, used to control the program flow. The following are examples of <field> children tags:

  • <grammar>: Grammar specifies the collection of possible caller inputs that a field should listen for. Fields in VoiceXML cannot take a best guess at what the caller said when listening to arbitrary inputs. Fields must know ahead of time the total possible inputs to expect, although the grammar size can be very large.
  • <prompt>: The prompt asks a caller for input, for example, "Say the name of a restaurant" or "Say or dial a ten-digit phone number."
  • <nomatch>: This tag becomes active whenever the caller provides an input that is not found in the field's grammar.
  • <noinput>: This tag becomes active whenever the caller fails to provide any input in response to a field prompt.
  • <filled>: When the caller provides a recognised spoken or DTMF command, the filled section becomes active. This tag is used primarily to determine the application control in response to a caller command.

TechRepublic is the online community and information resource for all IT professionals, from support staff to executives. We offer in-depth technical articles written for IT professionals by IT professionals. In addition to articles on everything from Windows to e-mail to fire walls, we offer IT industry analysis, downloads, management tips, discussion forums, and e-newsletters.

© 2001 TechRepublic, Inc.

Advertisement

Talkback 0 comments

Latest Videos

Sponsored content

Power Centre - Content from our premier sponsors

Blogs

  • Renai LeMay How reliable is IP telephony?
    Have you ever heard a weird kind of hissing, crackling or popping noise when calling someone on an IP telephony line? How rare is the phenomenon these days?
  • Array Forget the NBN, 100Mbps is already here
    Telstra and TransACT will shortly begin offering 100Mbps broadband to many customers. By moving early, the companies have not only raised the bar for Australia's broadband services, but thrown down a challenge to a government that now faces increased pressure to deliver the NBN as promised.
  • Array IT: Govt's cost-cutting bitch
    The government needs to stop looking at IT as a necessary evil or the place to remove costs when the Treasurer comes calling.
  • More blogs »

Tags

Back to top

Featured