Privacy-Sensitive Audio Features for Conversational Speech Processing
Hari Parthasarathi
ICSI
Tuesday, February 07, 2012
12:30pm
This work takes place in the context of capturing real-life audio for the analysis of spontaneous social interactions. Toward this goal, we wish to capture conversational and ambient sounds using portable audio recorders. Analysis of conversations can then proceed by modeling the speaker turns and durations produced by speaker diarization. However, a key factor against the ubiquitous capture of real-life audio is privacy. Particularly, recording and storing raw audio would breach the privacy of people whose consent has not been explicitly obtained.
We study audio features instead that can respect privacy by minimizing the amount of linguistic information, while achieving state-of-the-art performance in conversational speech processing tasks such as speech/nonspeech detection and speaker diarization. We provide a comprehensive analysis of the features in a variety of conditions, such as indoor (predominantly) and outdoor audio. To objectively evaluate the notion of privacy, we use human and automatic speech recognition tests, with higher accuracy in either being interpreted as yielding lower privacy.
Bio:
SHK Parthasarathi has joined the Speech Group as a postdoc, and is working on the OUCH project. Hari completed his PhD at Idiap and EPFL in Switzerland. His thesis was on audio features that respect privacy while capturing spontaneous conversations. Specifically, he looked at features that have low linguistic information for speech activity detection and speaker diarization. He did his master's at IIT Madras, with a thesis on the robustness of group delay functions to additive noise. He also had a stint with Honeywell (Bangalore), working on a few "small to medium" software projects.