IEEE Home | Shop IEEE | Join IEEE | myIEEE | Contact IEEE | IEEEXplore
IEEE

IEEE Signal Processing Society
Speech & Language Technical Committee


Question Answering at TREC and Beyond

BY SVETLANA STENCHIKOVA

Question answering (QA) has become a popular application in natural language processing field. Question answering task focuses on returning a concise answer to a natural language question, rather than returning a set of documents or snippets in response to a query, as does search.  

Question Answering at TREC

Text REtrieval Conference (TREC) is a yearly conference which organizes several tracks of competition for different applications of Information Retrieval. TREC has had a Question Answering track from 1999 until 2007. At TREC, evaluations are conducted based on several different types of questions. For example, there are factoid questions that ask for a concise fact (who, what, when, where) and list questions ask for a list of answers, e.g. a list of cities or people. Definition questions require a more general answer or a summary. In 2006,  TREC introduced a complex interactive question answering (ciQA) task where a system may clarify a question with a user. In ciQA task several templates are supported, like "What evidence is there for transport of [object] from [entity] to [entity]?" The ciQA questions were accompanied by nuggets. To evaluate these questions assessors interacted for five minutes per topic with each system. There were no restrictions on the nature of the interaction and 7 of the systems participating in the interactive QA featured a variety of interaction strategies. A good overview of the 2007 QA track at TREC can be found here.

A similar results trend among the top 10 systems appears from year to year in the TREC QA competition. There is generally a clear winner and a runner-up with a score in the 70's while the rest of the systems are scoring in the low 20's. The current evaluation computes the F-measure (combination of recall and accuracy) of the first answer returned by the system.  The scores of the top 10 systems for factoid questions last year were: .706, .494, .289, 0.258, 0.256, 0.236, 0.222, 0.222, 0.208, and 0.206. The top two systems are spinoff from the Language Computer Corporation (LCC) called the Lymba Corporation, and LCC themselves. LCC has been the undisputed leader in the past several years of TREC. All of the universities' systems in 2007 competition performed below 30% F-measure for the "easy" factoid questions. 

The QA track is no longer a part of TREC in 2008 but will still be held in a new evaluation hosted by NIST called theText Analysis Conference or TAC (web site currently under construction to be announced soon).

Pushing the State of the Art in QA

Question answering is a great platform for evaluating virtually all areas of NLP research. A variety of NLP processing tasks can be utilized in question answering: part-of-speech tagging, sentence chunking, deep and shallow parsing, information extraction, co-reference resolution, indexing text, web-crawling, and efficient text processing. Question answering also provides a platform for speech research including language modeling for question recognition, query grounding, question classification, and adaptation to the user.  Several major conferences have featured demos on speech-enabled question answering (HLT 2006, ASRU 2005).  Over the eight years it has gathered over 4000 questions and answer pairs, a valuable resource that may be used by research community. The resources from the past years can be freely downloaded from TREC's webpage.

It is relatively simple to create a basic QA system using off-the-shelf components for tagging and named-entity detection which performs in the 20's. However to improve the system and achieve higher performance one needs an expertise in multiple fields. The fact that at least one system achieved over 70% precision shows the feasibility of the task. Competition is very important for progress, but I think that introducing collaboration may be a beneficial strategy for a complex task like Question Answering. The GALE project (described in a previous newsletter article) is based on the collaboration of multiple institutions bringing their specific expertise to the project. GALE project focuses on information extraction for military analysts. A great potential contribution to the NLP research community would be to start a collaborative Question Answering initiative: to create and host an open-source QA system that would allow researchers to evaluate the specific components of their expertise.  A similar type of collaborative research effort has been recently started by the Carnegie Melon University Speech Lab where Let's Go dialogue system was made available for use by the larger research community. If a company or an institution takes up a collaborative QA initiative, the score of the collaboratively developed systems will have a chance to catch up with and potentially surpass today's leading scores.

 

 
SLTC Home   |    IEEE Home   |   Privacy & Security   |    Terms & Conditions