IEEE Home | Shop IEEE | Join IEEE | myIEEE | Contact IEEE | IEEEXplore
IEEE

IEEE Signal Processing Society
Speech & Language Technical Committee


Mighty Oaks from Little ACORNS Grow

BY STEPHEN J. COX

After twenty years of research into automatic speech recognition (ASR), few would deny that enormous progress has been made, but few would also deny that ASR remains a fragile technology, and ASR systems still perform well below humans, especially when noise is present.  Progress continues to be made using the "conventional" techniques of statistical pattern matching that have served us well, but a new project, ACORNS, funded by the European Community, is taking a fundamentally different approach to the problem.  Rather than exploring more sophisticated ways for modelling speech signals or classifying speech patterns, ACORNS is attempting to model the process of speech and speech and language acquisition itself, by drawing on knowledge of human speech recognition and cognition.

Within 24 months, a child has gone from a position of  "tabula rasa" to being capable of producing an average of around 250 words, and understanding many more.  The ACORNS team argues that the process of acquisition of speech and language skills is one of purposeful interaction between the child and its environment: the child learns how to understand and respond to speech because it needs to fulfil a set of goals, and it adapts its learning in response to feedback on its success.  Their system models some of the key features in this process, and in doing so, uses a number of techniques that are novel to speech and language processing.  For instance, their system receives two inputs: a multimodal "message" (speech plus image), and feedback on its response to the last message.  Rather than using pre-defined units for recognition (e.g. phonemes, words), they attempt to "discover" patterns within a speech signal, a process that is simplified by the fact that when interacting with an infant, we naturally repeat simple phrases to them.  The system also incorporates a model of memory (based on recent cognitive research), as well as cognitively plausible ways of forming, storing and retrieving representations of patterns.  All of this modelling is done within a mathematical framework based on linear algebra, which puts it on a sound theoretical footing.

The goal of the project at the end of its three year span is to be able to recognize a vocabulary of about 250 words, including adjectives and present tense forms of simple action verbs from arbitrary speakers, and to acquire additional words on the basis of a small number of training tokens.  Another important objective is to learn "word-concept combinations" such as colour names, spatial relations and adjectives referring to size.

Prof. Roger Moore, one of the project leaders said: "One of the real advantages of EU-sponsored Future and Emerging Technology (FET) type projects is that they provide a real opportunity to step outside the normal envelope of research in a particular field.  ACORNS is allowing us to investigate some fundamental aspects of human speech behaviour, and to build computational models of the very early stages of word learning in infants.  Results are beginning to emerge that could eventually have implications for mainstream approaches to speech pattern modelling."


 
SLTC Home   |    IEEE Home   |   Privacy & Security   |    Terms & Conditions