LangID - a tool for language identification in SMILA 0.7

Solution Description

The LangID tool provides as main functionality the automatic language identification of any text provided as input. It is based on an n-gram approach to language identification and therefore it is very quick. It can distinguish among a number of 26 languages:

Catalan Croatian Czech Danish Dutch English Esperanto Estonian Finnish French German Hungarian Icelandic Indonesian Italian Latvian Lithuanian Malay Norwegian Portuguese Romanian Serbian Slovak Slovenian Spanish Swedish

The Language Identifier can be used to learn profiles for new languages based on a collection of language specific documents. It can detect the language of a document based either on the first 30 words or on its whole content. The precision of detecting the right language lies between 98% - 99,5%, depending on the profile size. The latency of the component is about 6ms when considering the first 30 words of a document.

Additional Details

Organization Name: DFKI GmbH

Development Status: Production/Stable

Date Created: Wednesday, June 8, 2011 - 11:33

License: Commercial

Date Updated: Tuesday, January 10, 2012 - 05:21

Submitted by: Bogdan Sacaleanu

Date Ranking Installs Clickthroughs
February 2025 0/0 0 10
January 2025 0/0 0 13
December 2024 0/0 0 6
November 2024 0/0 0 7
October 2024 0/0 0 6
September 2024 0/0 0 9
August 2024 0/0 0 18
July 2024 0/0 0 19
June 2024 0/0 0 15
May 2024 0/0 0 9
April 2024 0/0 0 12
March 2024 0/0 0 9
View Data for all Listings

Unsuccessful Installs

Unsuccessful Installs in the last 7 Days: 0

Download last 500 errors (CSV)

Reviews Add new review