IEEE Home | Shop IEEE | Join IEEE | myIEEE | Contact IEEE | IEEEXplore
IEEE

IEEE Signal Processing Society
Speech & Language Technical Committee


Young, Mariani, and Moore Talk to Saras Institute

BY NAVEEN PARIHAR, SYLVIE SAGET & ANTONIO ROQUE

We continue the series of excerpts of interviews from the History of Speech and Language Technology Project. In these segments Steve Young, Joseph Mariani, and Roger K. Moore discuss how they became involved with the field of speech and language technology.

These interviews were conducted by Dr. Janet Baker in 2005 and are being transcribed by members of ISCA-SAC as described last newsletter.

Naveen Parihar transcribed the Steve Young excerpt
Sylvie Saget transcribed the Joseph Mariani excerpt
Antonio Roque transcribed the Roger K. Moore excerpt
Syxtus Gaál is coordinating the transcription efforts
 

Steve Young

Q: How did you get into this field?

A:  Well, sort of by accident. I graduated from Cambridge in 1973. And I went off to work for a company called GEC, which was not GE, that's different from what GE. It's the British version, but they were the biggest UK Electronics Company. I got to work in high speed digital communications, which was OK, but I worked there for about a year and I thought I did OK, and I produced this long report on the project I'd been working on and my boss sort of looked at it, said "That's very good" and put it on his shelf. And I think that was the last time that he looked at it. And I thought, "I do not feel particularly motivated by this." (laughs) So, I went back to Cambridge and did a Ph.D. and Frank Fallside was just starting to work on speech processing. He was a control engineer and he had some interesting projects. And I chose a project in speech synthesis. And that's how I started. I remember the very first day I arrived back at Cambridge to do my Ph.D. and Frank said to me "Ah, Steven, umm we've got a little job for you before you start. In fact, we have an appointment at Addenbrooke in about 50 minutes. Could you manage that?" I said "OK," so I climbed into his car and we whisked off to Addenbrooke, and I hadn't a clue what was going to happen.

Q: From Cambridge?

A:  Yes, it is a local hospital. I got taken into this, basically an X-ray department, where they start to inject barium paste in all my vocal orifices and start asking me to start saying these things. Actually he was working on vocal tract estimation, and so he wanted X-rays of people speaking various sounds and those days they were hard to come by. And the dosage levels were such that you could do only once with the technology at that time. So, about an hour, probably exaggerating a little, two hours after starting my Ph.D., I was sitting in the hospital (laughs) with my mouth and nose full of barium paste, valiantly trying to say these sounds that I had never heard before. The idea of phonetic alphabet, IPA and so, was completely alien to me. And I thought "Wow, this is real science! Suffering for your--", you know... And that's probably was the first and last time I did any serious experimental speech science. But that was my start.

Q: Well, that was really baptism by fire! So, he had already scheduled you without checking this out with you?

A:  Well, no, he scheduled somebody, he had some slots. But I think I was top of his list.

Q: Guinea pig for the day? (laughs)

A:  Guinea pig for the day.

 

Joseph Mariani

Q: How did you get into this field? How did you get your start?

A:   Ok. So: how I did get into Speech Science and Technology? Ok, so this was back in the early seventies, so exactly, back in 1973. I got a diploma from an engineering school in aerospace. It's far away from that field. But I was not too interested in aeronautics or space technology. So I had my studies in Toulouse in the South of France and I went to Paris and I decided to conduct a PhD thesis in the field-- well, initially I was looking at computer music. This was the activity which was of the most interest for me. So I went to University in Paris and I found a laboratory working in that field. But they told me that if I wanted to work on music it was ok, they would do music but they didn't have any computer facilities. This was back in the early seventies. But they said well there is a new laboratory which is part of us but which went to the south of Paris to Orsay. And this laboratory was LIMSI. They said: there, they don't work on music but they work on computers for language processing, for speech processing. And this was the offer they made. And I went to Orsay and I met Jean-Sylvain Lienard, who was leading the activities on speech there. And so I started my PhD thesis with Jean-Sylvain Lienard. So this was the way I joined activities on speech.

Q: So it was the closest you could get to music processing...

A:   Yes. Those were the most important for me: the music part and the computer, but this was not possible at that time. So, I used the computer. At that time it was IBM 1130. Which was already something for the time.

Q: That was a Fortran machine.

A:   Yes with Fortran of course. But, it was not enough powerful for doing what I wanted to do. I mean I was already addressing the topic of continuous speech processing. Actually, my topic was to realize a very low band vocoder at 50 bits per second. The idea was to recognize the phonemes, to transmit the phonemes and to re-synthesize the phonemes. This is 50 bits per second, so very very low bandwidth. So I didn't have enough power. So what I did is to use this IBM 1130 which was in the laboratory for all the signal processing aspects, you know, real time. Because it was only for us, I could have all the computer for myself. But then there was a connection with a bigger IBM 370168 which was a time-sharing computer, a big one which was on the CNRS facilities called CIRCE which was a big building, maybe 100 meters from LIMSI. But there were hundreds of users at a time. So it was very interesting in terms of computer engineering in order to mix up a real time activity with a small computer and then the more power but time-shared activity with a much larger computer. So this was fun I must say, and I succeeded in doing something in real time from the signal to the recognition.

 

Roger K. Moore

Q: We're wondering how you got into speech?

A:  OK... I was at university in Colchester, at the University of Essex in the UK, doing a course on computers in communications engineering. And actually at Essex in those days, it was a very small UK university, but rather than having a lot of small departments it had a few very big departments so that, in other words, it had respectable expertise in those different areas. So the electrical engineering department which I was in was one of the best in the country and they had a modest amount of speech activity, I subsequently discovered. I didn't realize that when I was an undergraduate, but when it came to choosing a third-year project and we were presented with a list of things we might aspire to do, I looked and I couldn't see anything that was of any great interest to me. So I recall we had the summer break to make up our mind and we were allowed to come up with suggestions of our own... so I went away and I thought: well, I'm not interested in those, so I want to do something pretty easy... (laughs) So I thought: well, you know, probably nobody's ever thought of building a machine that you could talk to, that would recognize what people say, and that sounds pretty easy, so I'll suggest that!

Q: What year was that?

A:   That was 1972... I think, or 73, thereabouts. And our projects were actually done in pairs, and I had a friend who actually lived local to me in the UK who was in the same department in the same course and so we decided to team up. He thought it was quite a good idea, also thought it would be quite easy. (laughs) He did the hardware, and I did the software on a PDP-8/E. I was the first person to use the extended memory, in other words to write a program that was bigger than 4k. (laughs) It was all paper-tape programming, quite exciting stuff. And this is - you'll probably like this, because the recognizer was called Electronic Apparatus for Recognizing Speech - EARS! So I was quite amused when the DARPA program of that name appeared relatively recently. And in my office at Sheffield I actually do have the aluminum front-plate from that recognizer which we built. And it worked, by the way!

Q: Yes, what did it do?

A:   It was isolated words, it had a filter-bank front-end, which was constructed from real filters, and the filter-bank was quite interesting, it was wired up using what was called a 'crab's eye' arrangement if you've ever heard of that. It was a scheme that was used to -- basically it was special filters which have nulls at the frequencies of the adjacent channels. And it's used in touch-tone, I think, --not absolutely sure of that. It was a long time ago, and basically it did a lot of suppression. So, whether that gave us a gain or not we don't know because of course we couldn't exactly unwire it and test! But I always thought it would be an interesting experiment to go back and have a play... It was simple template-matching and small vocabulary, I think we built little applications like a voice calculator so that we could speak sums and multiplications and things and it would do it all...


 
SLTC Home   |    IEEE Home   |   Privacy & Security   |    Terms & Conditions