Large-Vocabulary Speech Recognition Using Morph-Based Language Models

Mikko Kurimo

ICSI

Tuesday, April 10, 2012
12:30

Automatic speech recognition (ASR) systems trained for agglutinative and morphologically rich languages face the problem of vocabulary growth caused by prefixes, suffixes, inflections, and compound words. The typical solutions are to increase the vocabulary size or segment words into smaller units called morphs. The morphs can be either grammatical morphemes depending on the language or statistical morphs discovered by language independent unsupervised machine learning methods. The sub-word units are helpful for recognizing out-of-vocabulary (OOV) words, but in language models based on n-grams, short units often require a large n. Our morph-based ASR experiments in two highly inflecting and agglutinative languages show that the high-order models are essential also for constructing lattices in the first recognition pass. The analysis of recognition errors reveals improvements for not just OOVs but also for the previously unseen words. Further error analysis indicate how speaker adaptive training and discriminative training correct some of the ASR errors and which still remain as the most potential areas of improving the system. In addition to ASR, the morph-based language modeling is useful also for speech synthesis, information retrieval, and statistical machine translation.

Bio:
Mikko Kurimo is the leader of the Speech Group at the Information and Computer Science Department at Aalto University: http://research.ics.tkk.fi/speech. After his PhD at the Helsinki University of Technology (which later became part of Aalto) and postdoc at IDIAP, he became a professor and chief research scientist at Aalto. In addition to IDIAP, he has visited and worked with several research centers: University of Colorado at Boulder, University of Edinburgh, SRI, Cambridge University, and Nagoya Institute of Technology. His group at Aalto consists of 12 researchers concentrating on several aspects of speech recognition and funded mainly by projects in Academy of Finland, Tekes, and European Community. Their work is best known for unsupervised morpheme-based language modeling and adaptation of acoustic and language models for various speakers and speaking styles.

In 2012, Mikko Kurimo and his PhD student Seppo Enarvi are visiting ICSI, working on ASR for spontaneous speech.