Language Identification Tools


Language Identification Tools

System Number of languages Free
TextCat 69 free
SILC/Alis 28 commercial
Xerox MLTT Language Identifier 47 commercial
SUN's language identifier 12 ?
Collexion 15 commercial
Stochastic Language Identifier 13 free
Another demo of TextCat, with different language models, by Beat Flepp. 11 cf. TextCat
Natural Language Identification Tool (Giguet) 4 ?
Neural Network for Language Identification 4 ?
Rosette Language Identifier by Basis Technology 30 commercial
IDRIS LingWhat? ? ?
Language Identification program by Ted Dunning 2 free
Lextek Language Identifier many commercial/free
LangWitch by Morphologic 7 commercial
Language identifier by Petamem 65 ?
Python script by Damir Cavar 5 free
libtextcat cf TextCat free (BSD)
Java implementation of TextCat cf TextCat free (?)
Languid 72 (including such languages as pig latin, klingon, and both ukrainian and ukranian). The author writes: I've been a big fan of TextCat, and wanted to see what happened if I combined the same algorithm for n-gram based identification with some intelligence about Unicode. The result is a Unicode-friendly language identifier that makes some initial guesses based on script block. It relies on proper UTF-8 input to be happy. Download GPL
Mguesser about 100 (charset/language pairs); about 50 languages. C implementation of textcat GPL
Python implementation of textcat
lid 23 (in a range of encodings; a particular feature of this language identifier is, that it may even identify the language of texts in a transliterated form for some languages) commercial

MyGengo Is Mechanical Turk For Translations

MyGengo Is Mechanical Turk For Translations

by Serkan Toto on January 11, 2010

mygengo_logoSeveral ways to translate web sites, texts or documents online have emerged in the past few years, with Google Translate probably being the best-known tool. Google’s service is free and works for most quick and dirty translations, but when it comes to delivering truly accurate results, nothing can beat a human translator. In 2008, Google itself toyed with the idea of establishing the so-called Google Translation Center, a marketplace that was supposed to match translators with people who need texts translated.

The concept was shelved later, and now it’s a startup called MyGengo that tries to become the world’s Mechanical Turk for translations. MyGengo offers human translation services between English, Japanese, Chinese, Spanish, Italian and Russian. And it does work much like Amazon’s crowdsourced marketplace: The site’s 600 “certified” translators wait for a customer to upload a document or text and take care of those jobs they can deal with. Customers can choose between three quality and pricing levels and usually get the translations back within a few hours, saving up to 70% in costs when compared to professional translators.

MyGengo says their system makes it possible to accept just about any job size, including those usually denied by traditional translation agencies. Customers can have books, office documents, newspaper articles, blog posts or even tweets translated. (MyGengo itself translates selected English tweets from TechCrunch, Ashton Kutcher and others into Spanish to show how that looks like.)

The service offers two specific solutions for people who need to localize a website or an application: Starting March 2010, an API will speed up the process of requesting the translation of frequently updated content, for example blog posts or comments. Another solution dubbed String lets developers manage all language “strings” of a multilingual website through a dashboard during the localization process. This hosted service will link to the API, but using String by itself is completely free (more background).

Founded in Tokyo in 2008, the startup can count on the support of Silicon Valley-based investor Dave McClure in its efforts to conquer the American market for web-powered translations. McClure, who discovered MyGengo during the previous Geeks on a Plane trip to Japan, decided to make a personal (seed) investment in the company just last week and became an advisor, too.

MyGengo and their new investor are looking at a large market: The language service market as a whole is worth over $14 billion dollars already (MyGengo estimates web-based translations are worth around $3 billion), with some sources predicting it will balloon to $25 billions by 2013. (See this industry report for details).