Publication Details
Title: Acoustic Super Models for Large Scale Video Event Detection
Author: R. Mertens, H. Lei, L. Gottlieb, G. Friedland, and A. Divakaran
Bibliographic Information: Proceedings of the ACM International Workshop on Events in Multimedia (EiMM11), Scottsdale, Arizona
Date: November 2011
Research Area: Audio and Multimedia
Type: Article in conference proceedings
PDF: http://www.icsi.berkeley.edu/pubs/speech/ICSI_acousticsuper11.pdf
Overview:
Given the exponential growth of videos published on the Internet, mechanisms for clustering, searching, and browsing large numbers of videos have become a major research area. More importantly, there is a demand for event detectors that go beyond the simple finding of objects but rather detect more abstract concepts, such as “feeding an animal” or a “wedding ceremony”. This article presents an approach for event classification that enables searching for arbitrary events, including more abstract concepts, in found video collections based on the analysis of the audio track. The approach does not rely on speech processing, and is language-indepent, instead it generates models for a set of example query videos using a mixture of two types of audio features: Linear-Frequency Cepstral Coefficients and Modulation Spectrogram Features. This approach can be used in complement with video analysis and requires no domain specific tagging. Application of the approach to the TRECVid MED 2011 development set, which consists of more than 4000 random “wild” videos from the Internet, has shown a detection accuracy of 64% including those videos which do not contain an audio track.
Acknowledgements:
This work has been supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11- PC20066. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/NBC, or the U.S. Government.
Bibliographic Reference:
R. Mertens, H. Lei, L. Gottlieb, G. Friedland, and A. Divakaran. Acoustic Super Models for Large Scale Video Event Detection. Proceedings of the ACM International Workshop on Events in Multimedia (EiMM11), Scottsdale, Arizona, November 2011
Author: R. Mertens, H. Lei, L. Gottlieb, G. Friedland, and A. Divakaran
Bibliographic Information: Proceedings of the ACM International Workshop on Events in Multimedia (EiMM11), Scottsdale, Arizona
Date: November 2011
Research Area: Audio and Multimedia
Type: Article in conference proceedings
PDF: http://www.icsi.berkeley.edu/pubs/speech/ICSI_acousticsuper11.pdf
Overview:
Given the exponential growth of videos published on the Internet, mechanisms for clustering, searching, and browsing large numbers of videos have become a major research area. More importantly, there is a demand for event detectors that go beyond the simple finding of objects but rather detect more abstract concepts, such as “feeding an animal” or a “wedding ceremony”. This article presents an approach for event classification that enables searching for arbitrary events, including more abstract concepts, in found video collections based on the analysis of the audio track. The approach does not rely on speech processing, and is language-indepent, instead it generates models for a set of example query videos using a mixture of two types of audio features: Linear-Frequency Cepstral Coefficients and Modulation Spectrogram Features. This approach can be used in complement with video analysis and requires no domain specific tagging. Application of the approach to the TRECVid MED 2011 development set, which consists of more than 4000 random “wild” videos from the Internet, has shown a detection accuracy of 64% including those videos which do not contain an audio track.
Acknowledgements:
This work has been supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11- PC20066. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/NBC, or the U.S. Government.
Bibliographic Reference:
R. Mertens, H. Lei, L. Gottlieb, G. Friedland, and A. Divakaran. Acoustic Super Models for Large Scale Video Event Detection. Proceedings of the ACM International Workshop on Events in Multimedia (EiMM11), Scottsdale, Arizona, November 2011
