Question Answering at TREC and Beyond
BY SVETLANA
STENCHIKOVA
Question answering (QA) has
become a popular application in natural language processing field. Question
answering task focuses on returning a concise answer to a natural language
question, rather than returning a set of documents or snippets in response to a
query, as does search.
Question Answering at
TREC
Text REtrieval Conference (TREC) is a yearly conference which organizes several
tracks of competition for different applications of Information Retrieval. TREC
has had a Question Answering track from 1999 until 2007. At TREC, evaluations
are conducted based on several different types of questions. For example, there
are factoid questions that ask for a concise fact (who, what, when, where) and
list questions ask for a list of answers, e.g. a list of cities or people.
Definition questions require a more general answer or a summary. In 2006, TREC
introduced a complex interactive question answering (ciQA) task where a system
may clarify a question with a user. In ciQA task several templates are
supported, like "What evidence is there for transport of [object] from [entity]
to [entity]?" The ciQA questions were accompanied by nuggets. To evaluate these
questions assessors interacted for five minutes per topic with each system.
There were no restrictions on the nature of the interaction and 7 of the systems
participating in the interactive QA featured a variety of interaction
strategies. A good overview of the 2007 QA track at TREC can be found
here.
A similar results trend among the top 10 systems appears from
year to year in the TREC QA competition. There is generally a clear winner and a
runner-up with a score in the 70's while the rest of the systems are scoring in
the low 20's. The current evaluation computes the
F-measure
(combination of recall and accuracy) of the first answer returned by the system.
The scores of the top 10 systems for factoid questions last year were: .706,
.494, .289, 0.258, 0.256, 0.236, 0.222, 0.222, 0.208, and 0.206.
The top two systems are spinoff from the Language Computer Corporation (LCC)
called the Lymba Corporation, and LCC themselves. LCC has been the
undisputed leader in the past several years of TREC. All of the universities'
systems in 2007 competition performed below 30% F-measure for the "easy" factoid
questions.
The QA track is no longer a
part of TREC in 2008 but will still be held in a new evaluation hosted by NIST
called theText Analysis Conference or TAC (web site currently under construction
to be announced soon).
Pushing the State of
the Art in QA
Question answering is a great
platform for evaluating virtually all areas of NLP research. A variety of NLP
processing tasks can be utilized in question answering: part-of-speech tagging,
sentence chunking, deep and shallow parsing, information extraction,
co-reference resolution, indexing text, web-crawling, and efficient text
processing. Question answering also provides a platform for speech research
including language modeling for question recognition, query grounding, question
classification, and adaptation to the user. Several major conferences have
featured demos on speech-enabled question answering (HLT 2006,
ASRU 2005). Over the eight years it has
gathered over 4000 questions and answer pairs, a valuable resource that may be
used by research community. The resources from the past years can be freely
downloaded from TREC's webpage.
It is relatively simple to create
a basic QA system using off-the-shelf components for tagging and named-entity
detection which performs in the 20's. However to improve the system and achieve
higher performance one needs an expertise in multiple fields. The fact that at
least one system achieved over 70% precision shows the feasibility of the task.
Competition is very important for progress, but I think that introducing
collaboration may be a beneficial strategy for a complex task like Question
Answering. The GALE project (described in a
previous
newsletter article) is based on the collaboration of multiple institutions
bringing their specific expertise to the project. GALE project focuses on
information extraction for military analysts. A great potential contribution to
the NLP research community would be to start a collaborative Question Answering
initiative: to create and host an open-source QA system that would allow
researchers to evaluate the specific components of their expertise. A similar
type of collaborative research effort has been recently started by the Carnegie
Melon University Speech Lab where Let's Go dialogue system was made available
for use by the larger research community. If a company or an institution takes
up a collaborative QA initiative, the score of the collaboratively developed
systems will have a chance to catch up with and potentially surpass today's
leading scores. |