Designing the Virtual Agent: Some Theoretical and Practical Considerations
By Amy Neustein, Ph.D. (firstname.lastname@example.org), President and Founder, Linguistic Technology Systems
As we move in the direction of fully interactive speech enabled call centers, and the human agent gradually becomes a mere emblem of the past, the "human-like" features of virtual customer service representatives must be addressed. If virtual operators are to successfully replace their human counterparts we need to not only unravel the mystery of the "human touch," but to reduce its complex essence to a set of formulaic principles which can be readily applied to the design of conversational interfaces.
Here begins the controversy: if conversational interfaces appear too "polite" or "chatty"in essence, too "human"this can be vexing to the user. If, on the other hand, the dialogue system appears "machine-like" or robotic, this may undermine user satisfaction. This dilemma has so far divided the experts. In How to Build a Speech Recognition Application (1999) Bruce Balentine and David P. Morgan argue that a well-crafted personality [for conversational interfaces] that is distinctly machine-like, rather than human-like, can achieve user satisfaction and simultaneously avoid "unconstrained user behaviors, social assumptions, and the evocation of emotions" that occur when machines attempt to personify human interaction (p. 226). While the authors present a strong argument for preserving the machine-like qualities of speech interfaces, system designers appear to be responding to industry demands for more human-like conversational interfaces, so that callers may truly engage in a natural language dialogue with interactive voice response systems.
To design a system that effectively simulates human-to-human dialogue, it is essential to first understand the patterns of human interactions; otherwise the system produces a transparently ersatz conversation, not a true simulation of natural interactions.
For example, speech recognition engines that prompt users for clarifications in increasingly apologetic tones ("Iím sorry, I didnít understand you, would you please repeat your selection? . . . Iím really sorry, but I canít understand you, would you kindly repeat your selection?") are markedly out of step with the flow of actual human dialogue, in which repeated requests for clarification do not follow obsequiously-worded appeals, even in relationships where power and status are unevenly distributed.
While a discussion of the natural language pattern of error-correction and clarification-request is beyond the scope of this article, there definitely is a distinct pattern associated with the way speakers typically attempt to seek clarification and correct errors (and possible misunderstandings)and the design of a system whose goal is to emulate the human agentís interactive capabilities must, accordingly, incorporate the various ways that interlocutors are found to accomplish this interactive work in human-to-human dialogue.
In the design of the virtual agent, it is not only necessary to study the patterns of naturally occurring conversational interaction. Particular attention must be paid to the grammatical features underlying those verbal interactions that go awry. For example, in help-line discourse, although callers and operators often successfully negotiate mutual understandings, agreements and compromises, failures also occur: callers can reject well-intentioned advice offered by agents; the main purpose of the call may be eclipsed by digressive chatter; or the caller may feel the agent has abruptly terminated the call, especially in those instances where customer service representatives do not automatically query the caller about additional service items. A speech interfaceís function may be greatly enhanced by an ability to detect such possible communication mishaps and to steer the dialogue away from them.
Another design debate centers on how "intelligent" virtual agents actually need to be. Balentine and Morgan, in the book cited above, claim that a well-designed system must be "predictable, clear, well organized, uncluttered, and comprehensible," but that it does not necessarily need the sophisticated intelligence found in human interactions (p. 198). In an article in Speech Technology Magazine (1999), I supported this argument by showing how idiomatic expressions that are too recondite to be intelligently understood by a speech interface (in the absence of relevant socio-cultural knowledge) may nevertheless be grasped by a system less based on intelligent understanding than on careful design.
Whereas an intelligent system that attempts to incorporate a great mass of background knowledge can be quite expensive and complex to design, and can easily err because intelligence about the world is truly inexhaustible, a system that is less intelligent, but more carefully designed, may actually be better able to engage in a natural language dialogue with its user.
For example, if a caller exclaims to a virtual agent, "I havenít left a stone unturned!" a system built on intelligence might errantly conclude that the speaker is referring to an archeological dig. In contrast, a system that is skillfully designed, but has less commonsense knowledge, might simply look at the sequential arrangement of utterances containing the idiom, thereby deducing the speaker is expressing great frustration, rather than speaking about an excavation.
Lastly, call center multitasking demands quick and efficient responses to e-mails in addition to the duties of handling and directing calls. As e-mail increasingly becomes one of the most common methods of communicating with call centers, virtual agents must be able to understand and reply to e-mails just as easily as they engage in natural language dialogues with callers. Perhaps because e-mail is a relatively new form of communication, misconceptions about its style and function abound. Purely perfunctory response systems are now in place to provide a more or less standard response to e-mail messages; much of the content of these messages is undoubtedly being lost in these standardized response systems.
The design of virtual agent programs that can effectively respond to e-mail messages depends, first, on a proper understanding of this new medium. Major rethinking is required. Today, e-mail is usually misinterpreted by concentrating on its written form, when in actuality it more closely resembles spoken dialogue.
For example, an e-mail message can fuse disparate subjects within one paragraph, as is often the case in a natural language dialogue, whereas discrete paragraph demarcations are much more typical of most written text. Also, the technical ease with which an e-mail user can take excerpts from another personís e-mail, responding line by line, permits an interactive dialogue not much different from the exchange of speaking turns in conversations. Thus, in the same fashion that virtual agents might detect specific patterns of natural language dialogue, they may be designed to detect similar patterns of communication appearing in e-mail messages.
Inasmuch as call center technology is highly structured and sophisticated, the design of virtual operators or agents demands rigorous planning and consideration. Unfortunately, conversational interfaces often do not receive the thoroughness and planning they deserve. Scanty attempts are made to construct speech interfaces that simulate "human-like" features without properly researching the structural organization of conversational interaction.
Such attempts are likely to produce automated agents that are no more human-like than a telephone poleand may irritate or anger its users. While it may be difficult and costly to design simulacra that can service a caller as satisfactorily as a human can, the benefits of well-designed speech interfaces are of unquestionable value to call center technology.