Spectro-Temporal Features for Speaker Recognition

Howard Lei

ICSI

Tuesday, October 18, 2011
12:30 - 2:00pm

The 2D Gabor features (known as spectro-temporal features) have been a recent development in speech processing applications, and have been used mainly for ASR, with significant WER improvements especially in noisy recording conditions. They have been derived as an attempt to model certain stimuli that neurons of the mammalian auditory cortex are sensitive to, and these stimuli consist of both spectral and temporal modulation frequencies. In this work, we investigate the performance of 2D Gabor features for speaker recognition. We’ve explored different Gabor feature implementations, and different speaker recognition approaches, on ROSSI and NIST SRE08 databases. The different feature implementations arise mainly from differences in MLP training for the dimensionality reduction of the Gabor features. Using the noisy ROSSI database, we’ve obtained 13 percent and 20.9 percent relative EER improvements for feature- and score-level combination of MFCC and Gabor features, over MFCC features standalone. Our results show the value of both spectral and temporal information for feature extraction, and the complementarity of Gabor features to MFCC features.