The LangID tool provides as main functionality the automatic language identification of any text provided as input. It is based on an n-gram approach to language identification and therefore it is very quick. It can distinguish among a number of 26 languages:
Catalan Croatian Czech Danish Dutch English Esperanto Estonian Finnish French German Hungarian Icelandic Indonesian Italian Latvian Lithuanian Malay Norwegian Portuguese Romanian Serbian Slovak Slovenian Spanish Swedish
The Language Identifier can be used to learn profiles for new languages based on a collection of language specific documents. It can detect the language of a document based either on the first 30 words or on its whole content. The precision of detecting the right language lies between 98% - 99,5%, depending on the profile size. The latency of the component is about 6ms when considering the first 30 words of a document.