LangID - a tool for language identification in SMILA 0.7

Solution Description

The LangID tool provides as main functionality the automatic language identification of any text provided as input. It is based on an n-gram approach to language identification and therefore it is very quick. It can distinguish among a number of 26 languages:

Catalan Croatian Czech Danish Dutch English Esperanto Estonian Finnish French German Hungarian Icelandic Indonesian Italian Latvian Lithuanian Malay Norwegian Portuguese Romanian Serbian Slovak Slovenian Spanish Swedish

The Language Identifier can be used to learn profiles for new languages based on a collection of language specific documents. It can detect the language of a document based either on the first 30 words or on its whole content. The precision of detecting the right language lies between 98% - 99,5%, depending on the profile size. The latency of the component is about 6ms when considering the first 30 words of a document.

Additional Details

Organization Name: DFKI GmbH

Development Status: Production/Stable

Date Created: Wednesday, June 8, 2011 - 11:33

License: Commercial

Date Updated: Tuesday, January 10, 2012 - 05:21

Submitted by: Bogdan Sacaleanu

Date Ranking Installs Clickthroughs
February 2024 0/0 0 7
January 2024 0/0 0 16
December 2023 0/0 0 4
November 2023 0/0 0 8
October 2023 0/0 0 9
September 2023 0/0 0 8
August 2023 0/0 0 5
July 2023 0/0 0 3
June 2023 0/0 0 3
May 2023 0/0 0 3
April 2023 0/0 0 3
March 2023 0/0 0 3
View Data for all Listings

Unsuccessful Installs

Unsuccessful Installs in the last 7 Days: 0

Download last 500 errors (CSV)

Reviews Add new review