Conversational Interfaces: Advances and
VICTOR W. ZUE
JAMES R. GLASS, MEMBER, IEEE
The past decade has witnessed the emergence of a new breed of
human–computer interfaces that combines several human language
technologies to enable humans to converse with computers using
spoken dialogue for information access, creation, and processing.
In this paper, we introduce the nature of these conversational inter-
faces and describe the underlying human language technologies on
which they are based. After summarizing some of the recent progress
in this area around the world, we discuss development issues faced
by researchers creating these kinds of systems and present some of
the ongoing and unmet research challenges in this field.
Conversational interfaces, speech understanding
systems, spoken dialogue systems.
Computers are fast becoming a ubiquitous part of our
lives, brought on by their rapid increase in performance and
decrease in cost. With their increased availability comes
the corresponding increase in our appetite for information.
Today, for example, nearly half the population of North
America are users of the World Wide Web, and the growth
is continuing at an astronomical rate. Vast amounts of useful
information are being made widely available, and people are
utilizing it routinely for education, decision making, finance,
and entertainment. Increasingly, people are interested in
being able to access the information when they are on the
move—anytime, anywhere, and in their native language.
A promising solution to this problem, especially for small,
handheld devices where a conventional keyboard and mouse
can be impractical, is to impart human-like capabilities
onto machines so that they can speak and hear, just like the
users with whom they need to interact. Spoken language is
Manuscript received January 7, 2000; revised April 25, 2000. This work
was supported by DARPA under Contract N66001-99-1-8904, monitored
through the Naval Command, Control and Ocean Surveillance Center.
The authors are with the Laboratory for Computer Science, Massa-
chusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail:
Publisher Item Identifier S 0018-9219(00)08092-0.
attractive because it is the most natural, efficient, flexible,
and inexpensive means of communication among humans.
When one thinks about a speech-based interface, two
technologies immediately come to mind: speech recog-
nition and speech synthesis. There is no doubt that these
are important and as yet unsolved problems in their own
right, with a clear set of applications that include document
preparation and audio indexing. However, these technolo-
gies by themselves are often only a part of the interface
solution. Many applications that lend themselves to spoken
input/output—inquiring about weather or making travel
arrangements—are in fact exercises in information access
and/or interactive problem solving. The solution is often