Automatic Transcription of Speech
The EML European Media Laboratory GmbH (http://www.eml-d.de/) has combined its speech transcription platform with the power of the SMILA framework to enable the automatic conversion of audio into text. The EML Transcription Server is a highly scalable transcription solution that is providing very large vocabulary, speaker independent speech transcription capabilities for several languages and acoustic domains. The recognition results include the text as well as the corresponding timestamps and confidence values. These can be used to build sophisticated search or retrieval solutions, like indexing large audio documents (for example: podcasts) based on its spoken content and not on only on pre-defined tags or keywords.
The SMILA framework provides a wealth of technologies and features for developing powerful document processing and text mining solutions. The EML SMILA Transcription Pipelet seamlessly integrates the EML Transcription Platform into the SMILA framework. It enables the automatic conversion of audio data into a textual representation and makes the audio data available to existing text processing solutions. Given the additional time information within the recognition result, search results can be matched to their exact position within an audio or video document, thus providing more relevant search hits. Furthermore, the EML SMILA Transcription Pipelet provides interfaces and tools to allow partners to create and/or further customize their own language and acoustic models from within SMILA. This customization to their application domain leads to a significant improvement of the speech recognition results, which in turn provides better search results for the users.