The IT World Network Search |  Sites |  Services |  ITcareers IDG
Click Here!
Lead with Knowledge

Building the e-Business Infrastructure

Hot E-Business Solutions From Emerging Leaders

Taming the Data Tsunami

Outsourcing Internet infrastructure

INDEX:  TOP SUBJECTS: E-Business  |  CRM  |  Security
Home  //  CTO ZONE //  Resources //  Article

E-Business Strategies  |  Technology Innovation  |  Management Practices  | 
Industry Focus  |  Reports & White Papers  |  Webcast
Print Article    Email Article
Your word is my command

By Amy Neustein
June 27, 2001

Imagine talking to your microwave about how long to cook your favorite roast, or asking your mobile phone to contact the weather service in a distant city. Then imagine yourself lying on an operating room table while your physician gives voice instructions to a robot that controls delicate surgical instruments.

Just a few years ago these scenarios could only have occurred in a science fiction movie like the visionary 2001: A Space Odyssey. But today there are products and services with built-in speech-recognition capacity so that they can actually process voice commands, enabling them to serve you anywhere and without the need of a keyboard. Using your own voice you can access stock quotes, movie listings, airline reservations and flight information; get directions; find medical specialists in your area; and even order flowers.

The current explosion in speech-enabled technology (once principally thought of as a tool for disabled users unable to access keyboards and monitors) now allows the human voice to be better understood in spite of variations in dialect or accent. For instance, Verbal Tek, a Santa Clara, Calif.-based company, is now testing a PDA that can recognize the high-pitched voices of children and the English of non-native speakers. Smart auto attendants that allow users to access their own e-mail, faxes, and voice mail messages are the result of some of these recent developments in voice technology. A voice-activated assistant named SANDi, for example -- a product of Irvine, Calif.-based telecom company Sound Advantage -- can read you your e-mails and faxes, relay phone messages, and even recite your calendar of appointments, including time and location details. SANDi received the Product of the Year Award in 2000 from Customer Interaction Solutions and Computer Telephony for both quality and innovation.

Today's software industry is rich in voice technology companies, sporting such names as Chicago-based ShopTalk, Atlanta-based Fast-Talk, and Ann Arbor, Mich.-based Just Talk -- all trying to thrive in this relatively new industry. Speaking in May at VentureDowntown, the annual conference of the New York New Media Association, Alan Patricof, founder of New York-based Patricof & Co. Ventures and New York Magazine, was emphatic about their potential: "Voice technology is a major opportunity for the future," he said.

Earlier this year, against the backdrop of the majestic Camelback Mountains in Scottsdale, Ariz., the third annual Telephony Voice User Interface Conference brought together engineers and entrepreneurs from around the world for an unveiling of new voice-activated products and services -- devices that bring the movie 2001's computer character, HAL's, capacity for speech closer than ever before.

The conference organizer, Dr. William Meisel, president of TMA Associates in Tarzana, Calif., kicked off three days of talks on voice user interface (the addition of speech recognition capability to the touch-tone key pad) with a statement about the great strides made in speech recognition: "Speech recognition error rates are dropping annually by 30 percent," he said. Such technical progress can have revolutionary effects. In fact, industry analysts herald the voice Web, which gives a user access to Internet Web sites for movie listings, stock quotes, or weather and traffic reports via the telephone (rather than through a PC) as the most important innovation in technology since the advent of personal computing.

Among other things, the conference emphasized the great strides made by speech recognition and voice technology companies in accessing Internet sites both from mobile and landline phones. Natural Speech Communications, a speech recognition company nestled in a seaside suburb of Tel Aviv, Israel, has developed an unusually compact form of technology that can dramatically decrease the time it takes a caller to connect to the voice Web. Such compact hardware permits voice portal companies to give each caller a one-to-one ratio with voice portals -- telephone hookups to the Internet -- so that there is no time lapse while the system is hooking up its other users. And with industry projections that one-third of all households will use voice portals in just a few years, any technology that accelerates connection to the voice Web may be instrumental in preventing a major "traffic jam" among its users.

Speaking at the conference, industry expert Dr. Chester Anderson III, senior vice president of business development at Sound Advantage, summed up the incremental developments in speech technology by declaring that the industry has progressed from the "information age" to the "wisdom age."

Wisdom age

The "wisdom age" that Anderson speaks about was already in evidence at the April conference and expo of the AVIOS (Applied Voice Input/Voice Output Society), held in San Jose, Calif. Several new voice products and services debuted at the conference, but probably the most promising sign of things to come was the emergence of a well-orchestrated voice technology "food chain" -- a healthy interdependence among companies that produce dovetailing technology to broaden speech applications.

For instance, Voice Signal Technologies, in Woburn, Mass., develops and licenses speech interface products for mobile devices, interactive toys, automotive applications, and smart-home controls. According to Dr. Jordan Cohen, CTO of Voice Signal Technologies, the company's products are intended to give users an easy, reliable, and natural way to interact with the devices they already use every day. Cell phone carriers who license Voice Signal Technologies' software can allow their mobile phone users to access Internet browsers from a mobile phone simply by saying "browser," or they can have their customers view the list of calls received simply by saying "calls received." Voice Signal software runs on microprocessor platforms that are already common in consumer devices, giving manufacturers a cost-effective way to provide their customers with state-of-the-art voice control.

The industry refers to such products as "embedded" technology, meaning that they build upon existing circuitry or microprocessors (chips), integrating new software with already standard platforms. Inzigo, a Montreal-based company that is among the leading providers of natural language applications, has designed a service engine that adds a layer of intelligence on top of speech recognizers. Such service engines are designed to help speech recognizers to understand better what a speaker is saying. "The Inzigo [speech recognition] platform is designed to integrate with all of the leading speech recognition engines: Nuance, SpeechWorks, Philips, Locus, and IBM," explains Mobeen Khan, the company co-founder and CEO.

Gold rush, but with caution

There is no mistaking the gold rush mentality of speech technology developers -- they are clearly exhilarated by how the human voice can tap a reservoir of Web information, or how audio text (known as text-to-speech) can read e-mails and faxes to busy executives away from their desks. But a note of caution at this accelerated industry pace comes from human factor design specialists, such as Bruce Balentine, vice president of speech technologies at Denton, Texas-based Enterprise Integration Group and co-author of How to Build a Speech Recognition Application. As one of the nation's leading authorities on human factor design (the scientific study of user interface), Balentine stresses how critical it is to design speech-enabled programs that don't give too many instructions or menu selections for a listener to remember. He also cautions designers to refrain from over-friendly, chatty dialogue that may tempt the user to attempt dialogue too free-flowing and complex for the system to handle.

Speaking at the AVIOS conference in San Jose in April, Dr. Daryle Gardner-Bonneau, who has a Ph.D. in human performance and is founder of Bonneau and Associates in Portage, Mich., addressed the special needs of the elderly population. "All of us are only 'temporarily abled,' " she said, "and all of us will experience both physical and cognitive decline that will affect the quality of our user experience with speech technology as a function of perfectly normal aging." Gardner-Bonneau, in pointing out some of the special issues that must be considered when designing user interfaces for the elderly, such as automated call centers to handle social security and Medicare complaints, emphasized among other things the loss of high-frequency detection and discrimination, the decline in short-term memory, and the need for messages that are "clear and reasonably paced."

With all of these human factor issues taken into account, it is still not unusual to hear consumers express exasperation with the interactive voice response systems they reach when calling most businesses. "If only I could get to speak with a live agent" is a commonly heard complaint. But industry developers remind us that we are in a transitional phase while service industries gradually shift from the use of human agents to reliance on automated systems.

The plain fact is that today's around-the-clock lifestyle is making it impossible for human operators to be available to serve the public at all times service is being demanded. And a huge sales effort is now being aimed at business applications, selling businesses on the concept of automated call centers that use voice user interfaces to handle customer complaints and/or requests for products and services. The ease of replacing the human agent with his or her "virtual" counterpart and the dollar advantage that goes with it provide the mantras for mass marketing voice-activated call centers to American wholesalers and retailers.

Still, recognizing that consumer complaints cannot be taken lightly, the speech recognition industry is seriously considering how to improve applications so as to make speech-enabled products and services more user-friendly. Human factor specialists often work side by side with system designers to make sure applications effectively deliver the services they intend. Ultimately, the voice of the consumer will be heard above the voice of the industry, and industry analysts know that.

Amy Neustein, Ph.D., is president and founder of Linguistic Technology Systems, a New York-based think tank.


Business News
SUBSCRIBE TO:    E-mail Newsletters  InfoWorld Mobile
Home  //  CTO ZONE //  Resources //  Article Print Article    Email Article
Back to Top
E-mail Newsletters
InfoWorld Mobile
Print Magazine

Copyright 2001 InfoWorld Media Group, Inc.