IEEE Home | Shop IEEE | Join IEEE | myIEEE | Contact IEEE | IEEEXplore
IEEE

IEEE Signal Processing Society
Speech & Language Technical Committee


HLT Workshop Bridges the Gap in Dialog

By SVETLANA STENCHIKOVA

Commercial dialog systems are becoming ubiquitous - most banks, telephone companies, and other businesses have already automated their call centers. Dialog systems are also a subject of contemporary academic and industrial research. However there is a large gap between the two. The industrial dialog systems tend to use robust but inflexible solutions.  VoiceXML - a markup language extending XML, allows a quick implementation of a dialog system. However, for large systems it is cumbersome and inflexible because of its linear structure.  VoiceXML is the prevailing technology for dialog systems in the industry while research systems utilize more flexible but also complex technologies including Information State Update dialog managers, distributed system architecture, dialog managers with automatically learned rules and others. Techniques used in research are to a large degree underutilized in commercial systems today.

To discuss these issues, the workshop Bridging the Gap: Academic and Industrial Research in Dialog Technologies   took place immediately following NAACL/HLT 2007. It brought together researchers from both industry and academia to discuss issues important to both communities working on dialog systems. The workshop speakers presented a variety of projects including applications of machine learning to dialog management, improvements in Speech Recognition, analysis of the cost and value of Wizard-of-Oz experiment to optimize call center dialog systems, and evaluation of dialog systems. A range of different speech systems were presented: from a hardware and Internet troubleshooting dialog system replacing the need for a live agent to wearable language understanding system for the military.

Panel on limitations of Dialog systems

The workshop featured a panel discussion session on the limitations of dialog systems with panelists from both research and industry.

All panelists agreed designing a dialog system is a difficult and time consuming task. Building systems "is more art than science", the panelist Roberto Pieraccini from SpeechCycle pointed out. Standardizing system design processes with technologies currently used in research would facilitate carry-over from research to industry.  

Mazin Gilbert, a panelist from AT&T research labs thought that the metric for dialog evaluation should focus more on customer experience rather than dialog completion. He also pointed to the importance of Natural Language Generation saying that "people do not like automated hand-crafted responses".

Panelist Professor James Allen predicted that the trend of dialog applications will shift from call center applications to dialog systems for personal devices requiring systems to be able to engage in more of a "collaborative back-and-forth" with the user.

It would not be easy to adapt VoiceXML technology to such systems. Professor Alex Rudnicky pointed out that "no fancy dialog management matters if the Word Error Rate (WER) is too high". And an acceptable WER of below 15% is not always achievable under noisy conditions.

The ubiquitous question of a dialog system designer: "should the Automatic Speech Recognition of the system be statistical or grammar-based?" was also raised. Grammar-based systems notoriously perform better for the experienced users while the statistical systems provide more coverage and are better at accommodating novices. Panelists discussed the advantages and disadvantages of both approaches. They concluded that, depending on an application, there is space for both techniques and a clever combination of the two could potentially lead to a better performance.

Another interesting question that was raised: "What is the role of machine learning (ML) in dialog management?" Although it is a powerful technique there are doubts about its scalability. The panelists pointed out that ML solves tuning issues but a human still has to design the structure of the problem.  Panelist Professor Michael McTear said that POMDPs (ML technique currently used for Dialog management) do not resemble how humans handle conversations. He also addressed the problem dynamically evolving dialogs (where the state space is not known in advance) pose to POMDP learning.

Panelists discussed the dialog system applications of the near future including smart answering machines, games (it is surprising that the natural language dialog has not penetrated the game market yet!), assistants to elderly people, and dialog interfaces to PC software.

The audience was also interested in the question of standardization for dialog system components. Standardization in an ideal world would allow researchers to post components as web services and swap them transparently. Although everyone could see benefits of standardization it is difficult to see how concepts like semantics may be generalized: "the best we can do is to standardize meta-data" was the comment from the panelists.

Panel on collecting dialog data for the community

The workshop concluded with the panel Panel on Spoken Dialog Corpus Composition and Annotation for Research discussing a new dialog dataset for the research community.

Researchers from AT&T and CMU are collaborating to create a resource on human-computer dialogs in the domain of conference information. Two systems were developed for this task: CMU's ConQuest  (developed with the distributed Ravenclaw/Olympus architecture) and DiSCoH - the system from AT&T. Both systems were deployed during the SLT-2006 and Interspeech 2007 collecting data from the conference participants. When collected, the dataset will be free and available for everyone.

Questions of "how much data do we need?" and "what should be the annotations?" were discussed. Although "the more the better" for both the data and annotations amounts applied, there is a cost associated with it.  The consensus was to collect as much data as possible with automatic annotations and make it available to the audience (everyone). Researchers using this data are encouraged to make their annotations available to the community. A flexible layered annotation scheme is important in order to allow contributions from different sources.

The question on the richness of the conference information domain was brought up: what is the depth of interaction in this domain, does it have anaphora and deictic expressions.

Although the domain was already selected, the panelists addressed a question of what other domains would interest researchers. One of the proposed domains was "systems for users with real information needs".

If you would like to contribute your opinion on the questions discussed in the  panel, please fill out this survey.


 
SLTC Home   |    IEEE Home   |   Privacy & Security   |    Terms & Conditions