|
Mighty Oaks from
Little ACORNS Grow
BY STEPHEN J.
COX
After twenty years of research into
automatic speech recognition (ASR), few would deny that
enormous progress has been made, but few would also deny
that ASR remains a fragile technology, and ASR systems still
perform well below humans, especially when noise is
present. Progress continues to be made using the
"conventional" techniques of statistical pattern matching
that have served us well, but a new project, ACORNS, funded
by the European Community, is taking a fundamentally
different approach to the problem. Rather than exploring
more sophisticated ways for modelling speech signals or
classifying speech patterns, ACORNS is attempting to model
the process of speech and speech and language acquisition
itself, by drawing on knowledge of human speech recognition
and cognition.
Within 24 months, a child has gone from a
position of "tabula rasa" to being capable of producing an
average of around 250 words, and understanding many more.
The ACORNS team argues that the process of acquisition of
speech and language skills is one of purposeful interaction
between the child and its environment: the child learns how
to understand and respond to speech because it needs to
fulfil a set of goals, and it adapts its learning in
response to feedback on its success. Their system models
some of the key features in this process, and in doing so,
uses a number of techniques that are novel to speech and
language processing. For instance, their system receives
two inputs: a multimodal "message" (speech plus image), and
feedback on its response to the last message. Rather than
using pre-defined units for recognition (e.g. phonemes,
words), they attempt to "discover" patterns within a speech
signal, a process that is simplified by the fact that when
interacting with an infant, we naturally repeat simple
phrases to them. The system also incorporates a model of
memory (based on recent cognitive research), as well as
cognitively plausible ways of forming, storing and
retrieving representations of patterns. All of this
modelling is done within a mathematical framework based on
linear algebra, which puts it on a sound theoretical
footing.
The goal of the project at the end of its
three year span is to be able to recognize a vocabulary of
about 250 words, including adjectives and present tense
forms of simple action verbs from arbitrary speakers, and to
acquire additional words on the basis of a small number of
training tokens. Another important objective is to learn
"word-concept combinations" such as colour names, spatial
relations and adjectives referring to size.
Prof. Roger Moore, one of the project
leaders said: "One of the real advantages of EU-sponsored
Future and Emerging Technology (FET) type projects is that
they provide a real opportunity to step outside the normal
envelope of research in a particular field. ACORNS is
allowing us to investigate some fundamental aspects of human
speech behaviour, and to build computational models of the
very early stages of word learning in infants. Results are
beginning to emerge that could eventually have implications
for mainstream approaches to speech pattern modelling." |