Audio Mining - Build a solution for searching inside your audiovisual data
The Audio Mining component from the Fraunhofer IAIS (http://www.iais.fraunhofer.de) enables the natural expansion of the Eclipse SMILA framework (http://www.eclipse.org/smila/) towards audio and audiovisual documents. Using a state of the art system for automatic speech recognition (ASR), this component converts spoken German text into a textual representation. The result can then be processed in the same manner as written documents like web pages, mails or pdfs.
Audio Mining covers the complete workflow that is necessary for speech recognition: segmentation of the audio stream into homogeneous parts, applying speech detection, speaker diarization, and the actual speech recognition. This allows the user not only to access the text that was spoken, but also to know who said it and when. A sample search GUI that showcases possible uses is shown as screenshot.
In order to analyse large collections or archives, the Audio Mining component includes its own controller that schedules analysis tasks using multiple processing modules. These modules can be distributed over several machines, thus allowing easy scalability.
Since the performance of ASR system varies highly with respect to the domain of the documents as well as acoustic conditions and quality, the statistical models used for the analysis can be adapted to the users' needs.