Opinion & Editorial

Amy Neustein Untangling how users interact with the voice Web: building intelligence into voice-based apps
By Amy Neustein, Ph.D.
April 2002

Speech recognition problems still hamper the impact the Voice Web should be having on the V-commerce industry. However, the means exist to attack these obstacles to a growing business.

As a start, Seattle-based Voice Web Services provides a smart, intuitive and mobile solution to accessing valuable information through simple voice commands. Vanguard (Voice Activated Navigational Guide for Any Readable Document) can navigate text information using speech recognition technology that responds to natural language voice commands. For example, a user can choose the portion of an email attachment he/she wants to hear by saying, "Go to the third paragraph." And the system will respond accordingly.

But what happens when users, in the course of navigating text information, deviate from "standard" natural language commands, lapsing into a totally natural way of speaking in which they use directives peculiar to their own style of communicating? For instance, a user might say, "Give me the third item in." For years, secretaries have learned to understand expressions like this easily enough. But voice application services that cannot deviate from standard grammars will be unable to make sense of the user's natural language commands which fail to conform to preset word strings.

Sequence Package Analysis, developed by Linguistic Technology Systems, a New York area-based think tank for new solutions for speech enabled products and services, might be one way of helping the Voice Web to accommodate to the user's way of speaking. This new tool for natural language application software -- reported in NextInterface ("Why Linguistics is Important for the Design of a Non Fictional Hal," January 2002) -- works by capturing the unique speech characteristics of each speaker through the application of a grammar that arranges these features in the form of "sequence packages."

For instance, a speaker who commands his voice mail, "Give me the third item in" when accessing mobile documents -- or who provides "unintelligible" menu options upon making an airlines reservation -- will only be intelligible to the Voice Web when the discourse grammars are able to identify the discrete patterns of sequentially organized conversational features, consistent with an expansive range of intuitive natural language commands, rather than a few strings of standard lexical entries.

It was not too long ago that we had trouble imagining how a speech recognition engine would be able to recognize a speaker's voice. Now that industry analysts predict that within a few years one-third of all households will use voice portals to search for information online or to access emails and attachments, it is only plausible that the next step is to have speech engines recognize the non-standard discourse patterns indigenous to each speaker, which are nevertheless patterned and robust much like personality profiles.

Fortunately, since these speech patterns can often be as predictable and robust as personality profiles, improved natural language software can and will be able to decipher them and at the same time be scalable. And there's plenty of evidence that the speech technology industry is looking for such advanced developments in Voice XML application and server software.

In May, at the AVIOS (Applied Voice Input/Output Society) Conference in San Jose, the Voice XML Forum will be holding its meeting in conjunction with AVIOS so that attendees can take advantage of the Voice XML User's Group Meeting. Following in the fall is the SpeechTek 2002 International Exposition and Educational Conference, to be co-chaired by Dr. James A. Larson, Manager of Advanced Human Input/Output at Intel Corporation, chairman of the W3C voice browser working group (which sets the standards for Voice XML and related languages for voice portals) and author of VoiceXML: An Introduction to Developing Speech Applications (forthcoming).

Designing intelligent voice application software to better understand the natural language components of the user's interface with the Voice Web is clearly the way of the future.

Amy Neustein, Ph.D. is a member of the Editorial Advisory Board of Speech Technology Magazine. She is also president and founder of Linguistic Technology Systems.