HLT Workshop
Bridges the Gap in Dialog
By
SVETLANA STENCHIKOVA Commercial dialog systems are
becoming ubiquitous - most banks, telephone companies, and other businesses have
already automated their call centers. Dialog systems are also a subject of
contemporary academic and industrial research. However there is a large gap
between the two. The industrial dialog systems
tend to use robust but inflexible solutions. VoiceXML - a markup language
extending XML, allows a quick implementation of a dialog system. However, for
large systems it is cumbersome and inflexible because of its linear structure. VoiceXML
is the prevailing technology for dialog systems in the industry while research
systems utilize more flexible but also complex technologies including
Information State Update dialog managers, distributed system architecture,
dialog managers with automatically learned rules and others. Techniques used in
research are to a large degree underutilized in commercial systems today.
To discuss these issues, the workshop
Bridging the Gap: Academic and Industrial Research in Dialog
Technologies took place immediately following NAACL/HLT
2007. It brought together researchers from both industry and academia to discuss
issues important to both communities working on dialog systems. The workshop speakers presented
a variety of projects including applications of machine learning to dialog
management, improvements in Speech Recognition, analysis of the cost and value
of Wizard-of-Oz experiment to optimize call center dialog systems, and
evaluation of dialog systems. A range of different speech systems were
presented: from a hardware and Internet troubleshooting dialog system replacing
the need for a live agent to wearable language understanding system for the
military.
Panel on limitations of Dialog systems
The workshop featured a panel
discussion session on the limitations of dialog systems with panelists from both
research and industry.
All panelists agreed designing
a dialog system is a difficult and time consuming task. Building systems "is
more art than science", the panelist Roberto Pieraccini from SpeechCycle pointed
out. Standardizing system design processes with technologies currently used in
research would facilitate carry-over from research to industry.
Mazin Gilbert, a panelist from
AT&T research labs thought that the metric for dialog evaluation should focus
more on customer experience rather than dialog completion. He also pointed to
the importance of Natural Language Generation saying that "people do not like
automated hand-crafted responses".
Panelist Professor James Allen
predicted that the trend of dialog applications will shift from call center
applications to dialog systems for personal devices requiring systems to be able
to engage in more of a "collaborative back-and-forth" with the user.
It would not be easy to adapt
VoiceXML technology to such systems. Professor Alex Rudnicky pointed
out that "no fancy dialog management matters if the Word Error Rate (WER) is too
high". And an acceptable WER of below 15% is not always achievable under noisy
conditions.
The ubiquitous question of
a dialog system designer: "should the Automatic Speech Recognition of the system
be statistical or grammar-based?" was also raised. Grammar-based systems
notoriously perform better for the experienced users while the statistical systems
provide more coverage and are better at accommodating novices. Panelists
discussed the advantages and disadvantages of both approaches. They concluded
that, depending on an application, there is space for both techniques and a
clever combination of the two could potentially lead to a better performance.
Another interesting question that was raised: "What is the role of machine
learning (ML) in dialog management?" Although it is a powerful technique there
are doubts about its scalability. The panelists pointed out that ML solves
tuning issues but a human still has to design the structure of the problem.
Panelist Professor Michael McTear said that POMDPs (ML technique currently used
for Dialog management) do not resemble how humans handle conversations. He also
addressed the problem dynamically evolving dialogs (where the state space is not
known in advance) pose to POMDP learning.
Panelists discussed the dialog
system applications of the near future including smart answering machines, games
(it is surprising that the natural language dialog has not penetrated the game
market yet!), assistants to elderly people, and dialog interfaces to PC
software.
The audience was also
interested in the question of standardization for dialog system components.
Standardization in an ideal world would allow researchers to post components as
web services and swap them transparently. Although everyone could see benefits
of standardization it is difficult to see how concepts like semantics may be
generalized: "the best we can do is to standardize meta-data" was the comment
from the panelists.
Panel on collecting dialog data for the
community
The workshop concluded with the
panel Panel
on Spoken Dialog Corpus Composition and Annotation for Research
discussing a new dialog dataset for the research community.
Researchers from AT&T and CMU
are collaborating to create a resource on human-computer dialogs in the domain
of conference information. Two systems were developed for this task: CMU's
ConQuest (developed with the distributed Ravenclaw/Olympus architecture) and
DiSCoH - the system from AT&T. Both systems were deployed during the
SLT-2006 and
Interspeech 2007
collecting data from the conference participants.
When collected, the dataset
will be free and available for everyone.
Questions of "how much data do
we need?" and "what should be the annotations?" were discussed. Although "the
more the better" for both the data and annotations amounts applied, there is a
cost associated with it. The consensus was to collect as much data as possible
with automatic annotations and make it available to the audience (everyone).
Researchers using this data are encouraged to make their annotations available
to the community. A flexible layered annotation scheme is important in order to
allow contributions from different sources.
The question on the richness of
the conference information domain was brought up: what is the depth of
interaction in this domain, does it have anaphora and deictic expressions.
Although the domain was already
selected, the panelists addressed a question of what other domains would
interest researchers. One of the proposed domains was "systems for users with
real information needs".
If you would like to contribute
your opinion on the questions discussed in the panel, please
fill out this survey.
|