IEEE Home | Shop IEEE | Join IEEE | myIEEE | Contact IEEE | IEEEXplore
IEEE

IEEE Signal Processing Society
Speech & Language Technical Committee


Voices Across Birmingham

BY MARTIN RUSSELL

One of the characteristics of British English is the range and diversity of its regional accents and dialects.  Although it is acknowledged that inter-accent variation has implications for speech technology, most of the evidence to-date is anecdotal.  The dearth of hard experimental evidence on the effect of accent on, for example, automatic speech recognition performance, is due mainly to an absence of suitable data.  For this reason, in 2003 Aurix Limited funded the University of Birmingham to create the "Accents of the British Isles" (ABI) corpus, comprising approximately 100 hours of transcribed recordings of accented English from 14 different locations in the British Isles (Elgin (Scottish Highlands), Glasgow, Newcastle, Ulster, Dublin, Liverpool, Hull (East Yorkshire), Burnley (Lancashire), Denbigh (North Wales), Birmingham, Lowestoft (East Anglia), Truro (Cornwall) and Inner London).  At each town or city the goal was to record ten men and ten women who were born in that location and had lived there all of their lives.  ABI also contains recordings of 'Standard Southern English'.

In 2006 the University created a spin-out company, The Speech Ark, as a vehicle for creating further corpora.  It’s first project, ABI-2, was to record a new corpus which extends the original ABI corpus to include thirteen new regional accents (Edinburgh, Hartlepool, Leeds, Stoke-on-Trent, Coalville (Leicestershire), Shrewsbury (Shropshire), Hereford, Caernarfon (North Wales), Cardiff, Bristol, Yeovil (West Country), Gornall (Black Country) and Southend-on-Sea).  Together, the two ABI corpora contain almost 200 hours of recordings.  They provide a unique resource for speech science and technology research and a ‘snapshot’ of British English regional accents at the start of the 21st century.

The Speech Ark's most recent project is "Voices across Birmingham".  The objective is to record 200 hours of telephone conversational speech between people from the West Midlands.  This is a multi-cultural community, where the most recent census indicates that in 2001 the broad ethnic background of 20% of the population was Asian.  One of the challenges of "Voices across Birmingham" is to represent this diversity.  Once the Birmingham corpus is complete, The Speech Ark plans to create similar corpora for other regions of the British Isles.

For further information see The Speech Ark.

 


 
SLTC Home   |    IEEE Home   |   Privacy & Security   |    Terms & Conditions