|
|
To print: Select File and then Print from your browser's menu
-------------------------------------------------------------- This story was printed from ZDNet Australia. --------------------------------------------------------------
|
Figures of speech By David Braue, 0 January 30, 2002 URL: http://www.zdnet.com.au/news/communications/soa/Figures-of-speech/0,130061791,120263090,00.htm
We've been talking to, even yelling at, our computers for years. But it's only recently that they started listening. The past year has seen speech recognition grow from being a niche novelty product--initially marketed as a panacea for typing-phobic desktop users--into a major new driver for efficiency in customer care. This surge in interest has created a revenue boom for companies like Nuance, Speechworks, IBM, VeCommerce, and Philips. These companies are building considerable fortunes cashing in on rapidly growing demand for voice-enabled customer care systems. Designed to automate the completion of basic transactions requiring a standard set of information, such systems have quickly become a major strategic focus for companies struggling to bring down costs. In Australia, the first commercial speech recognition applications have included gambling, taxi booking services, banking, and tourism and government applications, and new customers are coming online with increasing frequency. TAB Queensland, NSW's TAB Limited, Gold Coast-based Regent Taxis, share trader TD Waterhouse and Sydney taxi network Combined Communications Network are among the technology's earliest adopters here. Thanks to a convergence of suitable technology, demonstrable benefits from those implementations and the pressure of continuing economic uncertainty, the future for speech recognition looks bright indeed. Approximately 1500 companies currently employ 200,000 people at around 4000 call centres within Australia, according to figures from industry analyst firm ACA Research Group. Around four percent of those call centres currently use some form of speech recognition, contributing to a total local market that ACA estimates at just under AU$50 million annually. That market will grow quickly as the technology gains momentum, according to ACA CEO Martin Conboy. "Speech recognition has gotten over the technology hump [of early scepticism]," he explains. "Speech recognition is ideal for distributing consistent information, and for uncomplex transactions that require low involvement from people-such as making a booking, account enquiry, paying a bill, or placing a bet. Early adopters see themselves as having a service differentiation and a competitive advantage, and we know that around 45 percent of the call centre market has a watching brief [on it]." Over the next 24 months, Conboy believes the market will grow rapidly to include 20 percent of Australia's call centres, particularly as increasing demand on company services forces suppliers to pursue more economical alterna tives to employing additional customer service representatives. "The main driver is the constant cost pressure that businesses are under," he says. "They're always looking for ways to get costs out of their business whilst delivering an acceptable customer experience. And since people will not wait in a queue for very long, Australian businesses face the fact that they are either one click or one phone call away from oblivion-because if they're not offering speech recognition, their competitors will be."
Learning to talkDespite its benefits, the technical aspects of speech recognition implementations are relatively painless thanks to more than decade in which the technology grew from being a laboratory fantasy to an inexpensive, practical new interface for human-machine interaction. This growth came as researchers, dedicated to finding a way to help computers understand a broad range of words and spoken accents, steadily improved their software's ability to recognise discrete phonemes. Recognising these sounds, which make up the basic units of all speech, lies at the core of today's speech recognition engines. In the early days, software basically tried to compare the sound wave of spoken words with those stored in a massive database containing recordings of words. But this approach was quite clunky, forcing speakers to pause after each word and demanding a noticeable amount of time for the system to search through its database. Even then, it wasn't particularly accurate: homonyms, tricky accents, background noise, and user unsophistication made early speech recognition an exercise in frustration. The technology's big breakthrough came in the late 1980s with the introduction of hidden Markov models (HMM), a mathematical technique for deciding the most probable match between two similar sets of data--in this case, digitised waveforms. Thanks to HMM, speech recognition systems became far more flexible by gaining the ability to compensate for imperfections in the speaker's voice. Over time, computers became more powerful and were able to pick increasingly small phonemic units out of a stream of speech. This paved the way for natural language speech recognition (NLSR), which finally freed users from the need to speak in halting phrases with breaks between words. Now that NLSR has been moved from the desktop to the server side, the technology has been successfully extended into a massively scalable speech processing infrastructure. This infrastructure is typically made up of one or several standard servers clustered together into a high-availability node. As a rule of thumb, plan to have one NLSR server per 30 phone lines. "There was a lot of marketing pressure behind speech recognition in the early and mid 1990s," says Clive Summerfield, a longtime speech researcher who founded Syrinx Speech Systems over a decade ago. "But over the past five years, the technology has started to come of age and is now really moving from high-cost, high-value niche applications into more mainstream style applications." A one-time Australian success story, Syrinx helped kick-start the global speech recognition market with world-class technology that secured a 6000-line customer care contract with US telecommunications giant AT&T in 1998. Last year, Syrinx installed 180 lines of speech recognition for online trader ComSec, then gained new management and a new name (Sayso!) before going into voluntary administration after a major investor pulled out this year. "Speech recognition models have been trained on very large databases of speech to create phoneme models," he continues. "Vendors have processed those large databases and therefore the technology now is very robust-and the more they're used, the better they become. This is a true learning machine, and the first of the artificial learning technologies to find wide-ranging commercial applications. When implemented correctly, it can return an extremely effective return on investment very quickly."
Building a business caseIn a time when IT purse strings are being tightened across the board, it's rare to find an application that offers such certain and substantial return on investment. The benefits come in many ways, all of which have quickly made NLSR a priority item on the agendas of business and technology executives alike. Perhaps the biggest savings come when call centre operators are freed of the burden of being a go-between between the customer and the company's information systems. Within months of implementation earlier this year, for example, Telstra's voice-driven directory assistance system had taken over around 15 percent of the millions of calls to the service every day. By reducing the time customers spend talking with customer service representatives, speech recognition can increase the number of customer calls that can be handled in an hour, eliminate waiting times, and increase transaction volumes. It also allows any company to service many customer requests 24 hours a day, seven days a week without needing to keep real people in the call centre. It does not take coffee breaks, go home at night or go on strike. Because of its benefits, speech recognition can be a seriously profitable new channel that also improves the customer experience--two points that should make it easy to build a business case for NLSR. Savings may come through factors such as reduced staffing levels, fewer rostered shifts or reductions in the expense of company-paid 1300 or 1800 lines. NLSR's financial benefits should increase its appeal to any executive board, as will the technology's exceptional ROI model: typical return on investment for speech recognition projects is less than 12 months. This target has even been exceeded by many of the technology's early adopters, confirming its status as a way of rapidly expanding the business. "People obviously see the business case behind it, but the only way a CEO will listen to a vendor about voice recognition is when they start talking ROI and dollars," says Luke Chambers, marketing manager of speech recognition integrator CallTime Solutions, which purchased Sayso!'s intellectual property earlier this year and recently received a AU$3.6 million Federal grant for developing speech recognition applications. Fortunately, this is easy to do because it's relatively easy to distil the potential benefits of speech recognition down to a simple number. You know how many calls your agents are handling and how long they stay on the phone; you know how much the servers for a NLSR solution will cost. Simply compare your current cost per human-assisted call second with the potential cost of a shorter, automated call and it should be clear to see how quickly NLSR will deliver big savings. "Some businesses have saved up to AU$6000 per call second per year by reducing the handling of calls," says Tim Courtright, managing director of Melbourne-based speech integrator Inflection Technologies. "A lot of calls going to call centre agents are low-value transactions where talking to the agent doesn't add anything to the experience for the business or for the customer. If you can help the customer in 60 to 90 seconds on a speech system as opposed to 180 seconds using IVR (interactive voice response-using the phone keypad to enter numbers or navigate menus with pre-recorded responses) or a person, you're saving the customer time and the company money." Speech recognition also provides non-financial benefits. Because call centre staffers aren't stuck handling boring, routine transactions, for example, they can get involved in more interesting aspects of customer care such as loyalty programs, strategic marketing, and administration. Providing more interesting work has already been shown to reduce staff churn--a real problem of call centre operators who have grown tired of investing in staff training only to watch talent get bored and leave. This was a major benefit for Auckland Co-op Taxis, which has seen what chairman & CEO Robert van Heiningen calls "a significant drop of turnover" since implementing speech recognition earlier this year (see sidebar). "People are sticking with us longer than ever before," he reports.
Ease them into itWith benefits almost certain to flow from a speech recognition implementation, IT strategists' work is already half done. Of course, all the usual rules of project management--design, implement, test, tweak, retest--apply. But relatively mature toolkits mean companies implementing speech recognition can focus less on technological specifics and more on mapping process flows and business rules within their organisation. One area meriting particular attention is the issue of customer acceptance. Although it's great technology, NLSR takes some getting used to, particularly for relative technical novices without the patience to understand how it's best used. If it's pushed too hard onto customers or within the company, it can potentially fall flat on its face. "The biggest challenge at the moment is end user acceptance," warns CallTime's Chambers. "At times where the recognition does fail, people will get frustrated. Even in a car, for example, you've got a lot of noise that could interrupt the session. The onus is on the company to do their research into the technology, and the challenge for vendors is to create robust applications that will work in any environment-and make the customer happy." Many customers still aren't, by some accounts. In September, a US study by analyst firm Jupiter Media Metrix (JMM) found that fewer than 40 percent of users prefer speech recognition systems to touch-tone dialling, with 16 percent of those surveyed preferring touch-tone systems over voice. Not surprisingly, young people are far more likely to have used NLSR services than older people. People that prefer touch-tone dialling may be in the minority, but their presence is a reminder that we have all become used to IVR even if we don't like it. This has implications for the design of speech recognition systems, which should be gradually introduced instead of foisted on customers all at once. At first, NLSR might only be introduced into a specific part of the IVR sequence, asking customers to say "yes" or "no" to confirm entered details instead of pushing numbers. A good second step would be the introduction of a very limited vocabulary to a specific function--for example, allowing customers to speak their membership numbers instead of punching them in. Today's technology offers all but perfect recognition of limited vocabularies such as spoken numbers or letters, and offering this as the second stage of a speech implementation can be an easy way to get customers used to talking to the computer while minimising their frustration. When designing a NLSR system, forget most of what you know about IVR, that clunky number-based system now ubiquitous in customer handling applications. Many early speech recognition implementations simply gave customers a familiar list of numbered options then had customers speak the number of the option they wanted. Although this may have been an effort to ease customers into the technology, it also compromises the efficacy of the recognition system. "Speech recognition has to be fundamentally different from a touchtone solution," says Summerfield. "It's a very different philosophy involved; you're essentially trying to model the way humans actually handle the calls, and then build the speech recognition solution around that."
Helping usersJust as good IVR systems let customers press zero to talk to a person, you should build speech recognition systems so users can easily navigate between major sections using specific keywords. It's not a bad idea to give the system a distinct name that people can use to alert the system they're going to enter a direct command. For example, instead of having to hit the * key to back out of a series of IVR menus, a user might be able to jump from a travel booking to a weather report simply by saying "HAL, tell me the current weather in Adelaide". The net effect is to construct the NLSR system as a cyber-persona that understands full sentences and can speak back the information to the customer in a natural voice. This approach will engender familiarity with the system as well as saving users from getting lost in the mire of poorly designed IVR systems. Although it will take customers a while to get used to a new system, in the long term the appeal of being able to skip long phone queues should convince most callers to get aboard. Since there will always be the inevitable problems, make sure customers can always get help from a human operator if they need it--either by pressing a specific key or by not responding when prompted. "We don't want customers to get up, sing and dance like they've just had a mind-altering revelation," says Paul Magee, managing director of speech recognition vendor VeCommerce. "The frontier is making the technology work in a way that makes sense to people. If they can just do a transaction without them even noticing it, we've done our job." "Doing this is not a computing issue," Magee continues. "At the core, the technology works. The real issue is layering a real set of rules that allows customers to say what they want they way it makes sense to them. This is in the design of the application, error handling, and all of the other grey and very humanistic issues around deploying the technology." Use your voice everywhere Most of the current attention in the speech recognition market is focused on company call centres. As the technology continues to worm its way into Australia's business psyche, however, it will quickly extend its reach into a variety of other applications. Sydney company Holly, for one, offers a hosted voice recognition service that's being bundled in several forms for various customers. The most widely used is a retail information service targeted at mobile professionals who call in via their mobiles to get current stock prices and other information. Citing the high cost of building brand recognition, however, Holly's directors have focused the company on selling the voice portal's core technologies in packages for integration with companies' call centres and back-end systems. "There's a large potential to replace IVR systems, and to a certain degree some call centre services, with things based on voice recognition," says chief operating officer Michael Atkinson. "We're getting to the point where the functionality is 80 percent of what people want, and since it's a fully open system we can go to systems integrators to customise it and deliver the other 20 percent."
What about the desktop?Customers' current focus on speech-enabling call centres has also masked the potential of older but less popular desktop speech recognition software. Originally positioned by technology visionaries like Bill Gates as a major step forward in usability, desktop speech recognition packages like Dragon Systems' NaturallySpeaking and IBM's ViaVoice have failed to ignite the popular imagination due to their high consumption of system resources, far from infallible design, and reduced effectiveness in noisy (read: office) environments. These issues have kept speech recognition from becoming more than a blip on the radar screen of priorities at most companies, where it's typically installed on a few scattered computers where requested by particularly poor typists. But it's hard for most people to get used to audibly punctuating every sentence and inserting every comma, something that's continued to hold the technology back. Not even the inclusion of speech recognition in Microsoft's Office XP is likely to change this in the short term, since its ubiquity doesn't change its usability problems. In the short term, voice may find a more receptive audience on the Web, where technology companies have long been working to add voice support. Their intention is to allow access to Web sites through both mobile and conventional phones, but lack of technical standards and the complexity of remote multimedia support have held them back in the past. There is a strong momentum for voice-enabling the Web, however. In a recent ACA survey, 43 percent of respondents said it was "quite likely" they would allow access to their Web site using speech recognition, while a further 13 percent said it was "very likely". Although the success of such projects will clearly depend on user adoption in the long term, they are now closer than ever thanks to VoiceXML (Voice eXtensible Markup Language), which recently entered version 2.0. An offshoot of XML, VoiceXML enables voice interaction with Web sites using meta tags to add meaning to the online content. Ultimately, VoiceXML support will allow customers to dial into a Web server and navigate the site using their voice. A text-to-speech engine will convert key pieces of text to voice for listening to content, while information will be sent to the Web server using speech recognition and an overlay that places spoken text into the appropriate fields based on their contextual VoiceXML tags. Widespread VoiceXML use is still some time away, but it's worth investigating as part of any broader strategy to add voice to your selection of customer contact channels. While speech recognition systems are never going to completely supplant other forms of customer interaction, they are now more than holding their own as an efficient, cost-effective upgrade that should be on next year's wish list for any company that cares about its customers.
Slow approach, rapid returnsSpeech recognition can deliver massive benefits very quickly, but you have to take the right approach. "If you don't design it very cleverly, you ultimately won't get the business objectives met and customers won't use the system," says Inflection Technologies' managing director, Tim Courtright. "What you have to bring to the party is the process of guiding them from the beginning of the sales cycle to deliver the business outcomes at the end of the day." Here are 10 pointers to get you there:
Copyright © 2009 CBS Interactive, a CBS Company. All Rights Reserved. |