|
2008 NIST
Speaker Recognition Evaluation
BY ALVIN MARTIN
The 2008 NIST Speaker
Recognition Evaluation (SRE08) was conducted during the
spring of 2008. This was an evaluation of research systems
for text independent automatic speaker detection in the
context of conversational speech, and extended a series of
such evaluations coordinated by NIST since 1996.
There were a record 46 sites or multi-site collaborations
submitting systems for the evaluation. Participants came
from around the world, including laboratories in Australia,
Singapore, China, South Africa, Israel, Lebanon, Lithuania,
the Czech Republic, Finland, Slovenia, Italy, Germany,
France, Spain, the Netherlands, the United Kingdom, Mexico,
Canada, and the United States.
The 2008 evaluation also exceeded those held previously in
terms of the amount and types of speech data to be
processed. Participants were required to process not only
conversational telephone speech segments, as previously, but
also conversational speech segments recorded over several
different types of room microphones. A key emphasis of this
evaluation was cross-channel speaker recognition capability,
being able to recognize a speaker heard over a different
audio channel from that over which the speaker's known
(training) data had been collected. Moreover, the microphone
data included both simultaneous recordings of telephone
conversations, and speech collected in a face-to-face
conversational interview scenario, testing recognition
capability across varying speaking styles.
The conversational telephone data used this year also
included over a thousand talkers, and hundreds of these were
bilingual speakers with conversations in two or more
languages. Speaker recognition in languages other than
English, and particularly in cross-language conditions
between training and test, is a considerable challenge.
The performance of the leading evaluation systems in SRE08
was very impressive. Significant performance improvements
were seen compared with previous evaluations for the
telephone conversational conditions tested previously. These
included tests involving short (ten-second) speech segments
as well as segments of several minutes duration. But most
notable were results suggesting that performance for the new
cross-channel microphone test conditions were often fairly
comparable to those for telephone speech.
A summary of the SRE08 evaluation results will be posted
later this summer on the
evaluation web page.
|