The Kaldi Speech Recognition Toolkit; and "Perfect" Lattice Generation

Dan Povey

Microsoft Research

Tuesday, January 24, 2012
12:30pm

Dan will talk about the open-source speech recognition toolkit "Kaldi." Topics covered include the history of the project, the overall design of the toolkit, the use of Weighted Finite State Transducers (WFSTs), mechanisms for dealing efficiently with large collections of data, the use of lattices, decoding-graph construction, and the algorithms used in the training recipes. He will also talk about lattice generation for speech recognition, and describe how to generate "perfect" lattices efficiently, using a special semiring.

Bio:
Daniel Povey received his Bachelor's (Natural Sciences, 1997), Master's (Computer Speech and Language Processing, 1998), and PhD (Engineering, 2003) from Cambridge University. From 2003 to 2008, he worked as a researcher in IBM Research in Yorktown Heights, New York. He is best known for his work on discriminative training for HMM-GMM based speech recognition: MMI, MPE, fMPE/fMMI, and boosted MMI. He is currently working at Microsoft Research.