Spelling Checkers — language specifications

Languages and sizes of dictionaries: go to language

New languages: Oriya (India), Luxembourgish, Friulian (Italy), Kazakh (Cyrillic/Latin), Khmer (Cambodia), Sinhala (Sri Lanka), Kurdish (Northern), Nepalese, Arabic, Azerbaijanian.

Recent upgraded spell checker languages: Dutch, Flemish, Surinam Dutch, Norwegian, Nynorsk, Swedish, Finnish, Czech, Spanish (4x), German, Swiss German, Austrian German, English (5x), Icelandic, Zulu, Xhosa, Afrikaans, Estonian, Danish, Frisian, French/Canadian French, Portuguese (acordo ortográfico), Catalan (nova ortografia), Russian, Hindi (India), Marathi (India), Telugu (India), Punjabi (India), Tamil (India), Gujarati (India), Bengali (Bangladesh), Malayalam (India), Breton, Italian, Slovenian, Croatian, Bosnian, Serbian, Macedonian.

The Arab and Hungarian lexicons have become the largest ever built without any artificial trick, both 5 million words.

96 languages (varieties)

English (lexicon size between 470,000 and 473,000 plus, selection May 2020)
The American English (1), British English (2), Canadian English (3), South-African English (4), Australian/New-Zealand English (5) versions include a set of collocations and automatic respelling functions between American English, Canadian English, and British English orthographical varieties, e.g.,
• (to UK) counseling -> counselling or
• (to US) counselling -> counseling;
• (UK & US) Mao Tse-tung -> Mao Zedong;
• (UK) 1400 AD -> AD1400 (anno Domini,
see the Style Guides of the New York Times and the Economist). Be careful with expressions as Thanks God its Friday!. Without an apostrophe it looks a bit strange. Therefore, a set of multiple word corrections is included, e.g.,
thank God its Friday -> thank God it's Friday,
or with multiple alternatives of contaminations (blendings),
redo it over -> 1) redo it, 2) do it over, etc.
tooth and tong -> 1) tooth and nail, 2) hammer and tongs, etc.
Lexicons agree with the leading unabridged dictionaries. The supplied idiom includes an extensive medical, chemical, social and geographical lexicon. Finally the idiom includes an extensive orthographical variety of "how-to-build" compounds.
visit download page | Continue ...

French/Canadian French (lexicon size over 628,500, idiome nouveau, selection May 2020)
Includes the most extensive geographical lexicon. Two lexicons are available, one according to the spelling of Le Larousse (2008), Le Nouveau Petit Robert (2012) & Le Robert illustré & Internet 2015 and one according to the most recent Rectifications de l’orthographe of the Conseil supérieur de la langue française first published 6 December 1990/1998 (see also http://www.orthographe-recommandee.info) and has become more and more accepted at present time. La nouvelle orthographe du français n'est pas imposée, mais elle est officiellement recommandée. Les modifications, modérées, touchent environ deux-mille mots.
Examples :
un compte-goutte, des compte-gouttes ;
un après-midi, des après-midis; cout ;
entrainer, nous entrainons ;
paraitre, il parait; j'amoncèle, amoncèlement, tu époussèteras
and most important: Les rectifications de l'orthographe ont été approuvées initialement par :
Le Conseil supérieur de la langue française (Paris);
L'Académie française (France).
Both French and Canadian French versions include extensive (automatic) re-spelling tools between previous and new spelling forms.
visit download page | Continue ...

German (lexicon size 1,276,000, selection May 2020)
The German orthography has been updated with the acceptance of the uppercase Eszett. Previous versions will be kept in the meantime.
The German spelling is distributed in four versions, “alt (pre 1996), new 1996 (the very first reform), new current orthography (2017), and the dpa version (2017)”. These versions include automatic respelling from old to new spelling forms (e.g., Prozeß → Prozess) and of “feste grammatische und lexikale Wendungen”. Using the old orthography or “alte Rechtschreibung” enables you to purify your texts, a full re-spelling system from new to old will surprise you (e.g., Prozess → Prozeß). A version for the Nachrichtenagenturen (dpa) as proposed by the German-speaking news agencies is also available. (http://www.die-nachrichtenagenturen.de).
The orthography neue Rechtschreibung is updated according to the Duden 26, August 2017, and the "Rat für deutsche Rechtschreibung", Grundlagen der Deutschen Rechtschreibung (2017), including the Eszett-Schreibung (ß/ẞ).
The German lexicon includes more than 300.000 expanded catchwords (konjugierte Stichwörter as defined in the Duden), and includes all German toponyms (Ortsnamen), over 13,000 autocorrections (Umschreibungen) and an extensive medical lexicon. Moreover, spell checking is strict, we don't approve errors like: Oberklasse-Wagen, Oberstufe-Schüler, Klasse-Bücher. It has to be: Oberklassenwagen, Oberstufenschüler, Klassenbücher.
visit download page | Continue ...

Swiss German (lexicon size 1,310,000, Swiss additions to German, selection May 2020)
The Swiss German lexicon includes all Swiss toponyms (Schweizerische Ortsnamen). There are three versions “alt, neu, dpa/SDA (2017)” see German.
visit download page | Continue ...

Austrian German (lexicon size 1,291,000, Austrian additions to German, selection May 2020)
The Austrian lexicon includes all Austrian toponyms (Österreichische Ortsnamen). There are three versions “alt, neu, dpa (2017)” see German.
visit download page | Continue ...

Spanish Peninsular, Argentine, Mexican & Latin American (lexicon size 975,500 - 982,500, selection May 2020)
The spelling is according to the new orthographical rules presented in the latest (la última edición) of the Ortografía de la lengua española (2010) and the Diccionario de la lengua española, RAE (2014). Includes respelling of a set of orthographical changes and common errors, e.g. exteniente coronel → ex teniente coronel, ex presidente brasileño → expresidente brasileño, anti-mafia → antimafia, Adam y Eva → Adán y Eva, Edinburgo → Edimburgo.
visit download page | Continue ...

Italian (lexicon size 985,000, selection April 2020)
The spelling is according Lo Zingarelli 2014. Includes pronomial forms, and an extensive geographical lexicon (comuni e luoghi italiani), and a set of multiple word corrections, e.g., il pneumatico -> lo pneumatico, vicino Roma -> vicino a Roma.
visit download page | Continue ...

Swedish (lexicon size over 2,015 million words, selection June 2020)
Includes geographical and proper names, SI unit correction and punctuation correction (not «blod, svett och trådar», but »blod, svett och tårar»); orthography according to Svenska Akademiens ordlista över svenska språket.
visit download page | Continue ...

Portuguese (lexicon size 1,685 million words, selection April 2020)
Iberian and Brazilian Portuguese are very different in terms of use of verb tenses and idiom. Often Brazilian Portuguese is unacceptable for Iberian Portuguese publications, and the reverse is a source of misunderstanding too. Independently of orthography dictionaries need to be different. Therefore Iberian and Brazilian versions according to the previous and acordo ortográfico, have been compiled. These versions include respelling either between Iberian Portuguese and Brazilian Portuguese or between the previous and acordo ortográfico. O presidente de Portugal, Aníbal Cavaco Silva, promulgou o acordo ortográfico da língua portuguesa, ratificado no Parlamento do país em maio, informaram hoje à Agência Efe fontes da Presidência. ...., O Novo Acordo Ortográfico da Língua Portuguesa está em vigor no Brasil desde o último dia 1º (2009, Grande Dicionário 2010).
Examples: equipolente versus eqüipolente or boleia versus boléia or ação versus acção
visit download page

Dutch (Nederlands, lexicon size 806,000, selection July 2020)
The spelling according to the governmental rules (Groene Boekje, Workgroup Spelling, 2005, Taalunie, update Taalunie errata 27-08-2010) and in agreement with Van Dale Groot Woordenboek van de Nederlandse Taal (XIV ed.).
The lexicon's idiom covers national and mondial geographic information, medical, administrative, social and many other special terms. A set of over 27,000 collocations and (respelling) autocorrections from the previous to the new orthography is included. This set includes multiple word alternatives for weird combinations such as "door de regen en de wind" -> 1) "door weer en wind" or 2) "in de regen en de wind", a linguistic mutilation of "(come) rain and shine".
visit download page | Continue ... | hall of shame ...

Flemish (Vlaams, lexicon size 829,000, selection July 2020)
The spelling according to the governmental rules (Groene Boekje, Workgroup Spelling, 2005, Taalunie, update Taalunie errata 27-05-2008) and agrees with Van Dale Groot Woordenboek van de Nederlandse Taal (XIV ed.)
The lexicon's idiom covers national and mondial geographic information, medical, administrative, social and many other special terms. A set of over 27,000 collocations and (respelling) autocorrections from the previous to the new orthography is included. This set includes multiple word alternatives for weird combinations such as "kost duur" -> 1) "is duur" or 2) "kost veel", a linguistic mutilation of "is expensive".
visit download page | Continue ...

Surinam Dutch (Surinaams-Nederlands, lexicon size 807,500, selection July 2020)
The Republic of Surinam has entered the Dutch Taalunie (January 2005) to unify their language with the Dutch language. The peculiarities of Surinam Dutch call for a separate lexicon. The spelling agrees with the governmental rules (Groene Boekje, Workgroup Spelling, 2005, Taalunie).
The lexicon's idiom covers national and mondial geographic information, medical, administrative, social and many other special terms. A set of collocations and respelling from old to new orthography is included.
visit download page | Continue ...

Catalan (nova ortografia) (lexicon size 1,420,000, selection March 2018)
The lexicon includes all combinations of pronoms personals. The spelling agrees with Diccionari ortogràfic i de pronúncia, Enciclopèdia Catalana. The nova ortografia, not arítmia but arrítmia, not angiospasme but angioespasme, not dóna'ls-el but dona'ls-el, agrees with the Institut d'Estudies Catalans (May 2017) whose spelling reform has been entered with a transition period of five years.
visit download page

Danish (lexicon size over 1,063,500, selection May 2020)
The spelling agrees with the Contemporary Danish spelling according to Dansk Retskrivingsordbogen of the Dansk Sprognævn, med ændrede ord- og staveformer; with recent orthographic changes (2012). It includes geographic names (stednavne), present-day idiom, and a set of multiple word corrections (faste udtryk), e.g., æblerne lægger i skålen -> æblerne ligger i skålen, web siterne -> websiterne, etc.
visit download page

Norwegian, Nynorsk (lexicon size Bokmål 1,258,000 Nynorsk 615,500, selection June 2020)
The spelling agrees with the Contemporary Norwegian spelling according to Tanums Store Rettskrivningsordbok and the Norwegian Språkrådet. It includes present-day idiom, and a set of multiple word corrections, e.g., i Møre -> på Møre (always notify), på Møre og Romsdal -> i Møre og Romsdal, etc.
visit download page

Sámi (lexicon size 1,6 million, selection March 2008)
The spelling agrees with the Nord Sámi language as spoken in Finnmark county in the north of Norway. Sámi is a highly inflected language and words can have numberous word forms. This feature makes the North Sámi lexicon very lengthy.
visit download page

Finnish (lexicon size over 5.34 million words, selection June 2020)
Spelling and templates (taivutustyypit) of the revised modules agree with the Contemporary Finnish, Kielitoimiston sanakirja, 2012. The lexicon has extensively been tagged with declension and conjugation classifications. This is a requirement given the compound nature of the Finnish language.
visit download page

Afrikaans (lexicon size 338,000, selection May 2020)
The lexicon agrees with the spelling rules of the Suid-Afrikaanse Taalkommissie, 2002. It matches to present-day idiom of the South African society, including a wide variety of neologisms, geographical, business, and social words. The spell checker includes mechanism to proof neologisms by examination of component parts. This mechanism doubles the effective size of the lexicon.
visit download page | die taal en die passende tegnologie (PDF)

Latin (lexicon size 450,000, selection August, 2007)
The Latin lexicon has been compiled from classical, medieval, clerical, vulgate, and scientific texts. Names from the classical period and from the clerical (and Biblical) world have been included in the lexicon. Like dictionary publishers we do not use ligaturs: oeconornicae, Aegiptum, etc.
visit download page

Basque (lexicon size 3,8 million selection August 2010)
The Basque language is highly inflected, and so is the Basque lexicon. Financial, Scientific, Geographical and proper names are included in the lexicon: Euskadi, Euskadik, Euskadiko, Euskadikoa, Euskadin, Euskadira, Euskadiren, Euskadirentzat, Euskaditik, Euskadiz, amortizazio-prezio..., banku-txartel..., efektu-biomarkatzaile..., epitelio-zelula..., etc. The Basque spell checker is capable of detecting differences between inanimate and animate word classes.
visit download page

Russian (Россия) (lexicon size 1,390,000, selection February 2018)
The Russian language goes back to Old Church Slavic, but a literacy tradition less tied to the church and Old Church Slavic exists too. The last extensive spelling reform occurred in 1917, however, idiom has been changed.
visit download page

Estonian (lexicon size 2,235,000, selection January 2020)
The Estonian language belongs to the Finno-Ugric family of languages. It is closely related to Finnish, and similar to Finnish prepositions are attached to the end of the word, but without vowel harmony.
visit download page

Icelandic (lexicon size 845,000, selection May 2020)
The Icelandic language is a North Germanic (Scandinavian) language, since 1935 the official language of Iceland. Icelandic is characterized by extensive vowel gradations, for masculine, feminine and neuter. The historical morphological characteristics have been preserved.
visit download page

Lithuanian (lexicon size 875,000, selection September 2012)
The Lithuanian language, like Latvian, belongs to the Baltic family of languages. Lithuanian uses the Latin alphabet with diacritics, including as <ė>, <į>, <ų>. Lithuanian is highly inflected.
visit download page

Latvian (lexicon size 705,000, selection August 2012)
The Latvian language is one of the Baltic languages (see Lithuanian). The orthography is based on the Latin alphabet with diacritic marks, including <ņ>, <ķ>, <ģ>, <ļ>.
visit download page

Polish (lexicon size 1.9 million, selection December 2013)
The Polish language is a West Slavic language spoken by approximately 42 million speakers. It is written in the Latin alphabet with diacritic marks and special characters: ł, Ł, ż, Ż.
visit download page

Frisian (Frysk) (lexicon size 433,000, selection April 2020)
The Frisian language is spoken by approximately 300,000 speakers in the Dutch province of Friesland. It has been standardized thanks to the efforts of the Fryske Akademy. It is distinct from East and North Frisian dialects in Northern Germany. Orthography in agreement with the "offisjele stavering fan de Fryske taal 2014" published by the of the Fryske Akademy, e.g., ienentweintichste-iuwsk not ienentweintichste-ieusk.
visit download page

Galician (lexicon size 245,000, selection August 2007)
The Galician language is now spoken in Spanish Galicia, situated north of Portugal. It is a Romance language related to Portuguese. Spelling according “Dicionário da língua galega, Sotelo Blanco”.
visit download page

Hungarian (lexicon size over 5 million words, selection December 2009)
The Hungarian language belongs to the Uralic family of languages. It is the official language of Hungary. There is a weak relation to the Finno-Ugric languages. The orthography includes characters with the Hungarumlaut: <ő>, <ű>.
visit download page

Czech (lexicon size 1,804,000, selection June 2020)
The Czech language is a West Slavic language. The orthography is based on the Latin alphabet, including diacritics: <č>, <ď>, <ě>, <ů>, <ž>.
visit download page

Upper Sorbian (lexicon size 770,000, selection January 2009)
The Upper-Sorbian language is a West Slavic language. The orthography is based on the Latin alphabet. Upper and Lower Sorbian is spoken in the South Eastern section of the former German Democratic Republic. Spelling agrees with Hornjoserbskeje rěčneje komisje hač do junija 2005.
visit download page

Maltese (lexicon size 845,000, selection January 2006)
The Maltese language is a Semitic language written in the Latin alphabet, including <ċ> <ħ> <ġ> and <ż>, orthography according to Joseph Aquilina (1987/1990). The speller includes checks for proper use of assimilations of the article.
visit download page

New Greek (Ελληνικά) (lexicon size 785,000, selection September 2009)
The Greek characters α, β, γ, .... to ω have been used for millenniums. We do not know how Ancient Greek was pronounced, but modern Greek certainly is different. It now uses only a limited number of accents and diaereses.
visit download page

Occitan (lexicon size 250,000, Selection June 2007)
Also known as Languedoc, is the original language spoken by the troubadours and Cathars in the South of France. The reconstruction of the language is based on the work of Loís Alibèrt (2000).
visit download page

Esperanto (lexicon size 300,000, selection August 2003)
Esperanto is an artificial language, introduced by Dr. Lazaro Ludoviko Zamenhof. The language is based on several Indo-European languages. Typical for Esperanto are the characters <ĉ>, < ĝ>, <ĥ>, <ĵ>, <ŝ> and <ŭ>.
visit download page

Turkish (lexicon size 1,860,000, selection November 2015)
The Turkish language is written in the Latin alphabet, but a few characters were added, such as the dotless-i which is very different from the dotted-i. Therefore the letter i is not a lower case of the majuscule letter I, a major problem to many systems. A geographical and medical lexicon is included.
visit download page

Romanian (lexicon size 1,000,000, selection June 2009)
The Romanian language belongs to the Roman languages. It includes a few additional characters such as the a-breve <ă>, i-circumflex <î>, the s-cedille <ş>, the t-sedille <ţ>, the s-comma below <ș>, the t-comma below <ț>.
visit download page

Bulgarian (lexicon size 840,000, selection February 2016)
The Bulgarian language is written in the Cyrillic alphabet, the same alphabet as used in the Russian language, however, the pronunciation of the hard and soft sign differs.
visit download page

Faroese (lexicon size 517,000, selection November 2012)
The Faroese language is spoken by 50,000 inhabitants of the Faroer Islands. It is a language based on the old Norse as is the Islandic language.
visit download page

Bahasa Indonesia (lexicon size 76,000, selection May 2010)
The Bahasa Indonesian language is the standard language written and spoken in the Republic of Indonesia. Many Austronesian languages are spoken in the Indonesian Archipelago, but Bahasa Indonesia is the lingua franca.
visit download page

Slovenian (lexicon size 748,000, selection April 2016)
The Slovenian language is spoken in the Republic of Slovenia, situated between Austria, Hungary, Croatia, and Italy. It is a south slavic language written in the Latin alphabet, including a few Slavic characters such as <č>, <š>, <ž> and the diagraphs Lj and Nj. Slovenian is highly inflected and nearly every noun has an adjective form too.
visit download page

Croatian (lexicon size 633,000, selection April 2016)
The Croatian language, formerly named Serbo-Croatic, is closely related to Serbian. The Croatian language is written in the Latin alphabet, including a few typical Slavic characters such as <č>, <ć>, <š>, <ž>, and digraphs Lj and Nj.
visit download page

Bosnian (lexicon size 650,000, selection April 2016)
The Bosnian language, formerly named Serbo-Croatic, is closely related to Serbian and Croatian.
visit download page

Serbian Cyrillic (lexicon size 658,000, selection April 2016)
The Serbian language is written in the Cyrillic alphabet, including typical Serbian characters Dž, Lj, Nj (Џ, Љ, Њ).
visit download page

Byelorussian (lexicon size 1,6 million, selection January 2008)
The Byelorussian language is written in the Cyrillic alphabet, like the Russian language, but the language was heavily influenced by Polish for centuries. Today, in the Byelorussian Republic, Byelorussian plays a lesser role compared to the Russian language.
visit download page

Slovak (lexicon size 1 million words, selection August 2009)
The Slovak language is closely related to Czech, but a few characters differ.
visit download page

Ukrainian (lexicon size 1,15 million words, selection November 2008)
The Ukrainian language is written in the Cyrillic alphabet, but for centuries the language was heavily influenced by Polish.
visit download page

Swahili (lexicon size 75,000, selection February 2005)
The Swahili language is spoken along the East Coast of Africa. It is the lingua franca of many coastal nations. The standardized language is called Kiswahili Sanifu. It shares the word kamusi (dictionary) with the Melayu word kamus. Swahili is written in the Latin alphabet.
visit download page

Bahasa Melayu (lexicon size 62,000, selection September 2009)
Bahasa Melayu is the standard language of the Republic of Malaysia. It has a common root with Bahasa Indonesia. However, Bahasa Melayu was heavily influenced by the English language while Bahasa Indonesia was influenced by Dutch during the colonial age.
visit download page

Irish (Gaelic) (lexicon size 325,000, selection August 2007)
The Gaelic language is a Celtic language spoken in Western Ireland. A class of words is lenited, pronounced with palatalization. A slightly different variety is spoken in the Highlands of Scotland.
visit download page

Welsh (Cymraeg) (lexicon size 900,000, selection July 2015)
The Welsh language is the Celtic language of Wales, spoken by about 500,000 people (mainly bilingual in English). The current lexicon supports hyphens and apostrophes in words. The expression system supports re-writing of erroneous mutations, e.g., "hen gwlad" >> "hen wlad" or "o Castell" >> "o Gastell".
visit download page

Greenlandic (lexicon size 85,000, selection February 2008)
is an East Inuit language spoken by 50,000 Greenlanders.
The Greenlandic language adds particle to particle to words and leading to a single word sentence. The Latin alphabet is used whereas the Canadian Inuit make use of their own script.
visit download page

Macedonian (lexicon size 324,000, selection April 2016)
The Macedonian language is written in the Cyrillic alphabet.
visit download page

Albanian (lexicon size 585,000, selection April 2011)
The Albanian language is written in the Latin alphabet. The Albanians call their language shqip and their country Shqipëria.

Maori (lexicon selection March 2004)
The Maori language is spoken in New Zealand and is written in the Latin alphabet. A macron is placed above the vowels to differentiate between long and short vowels.
visit download page

Xhosa (lexicon size 171,000, selection May 2020)
The Xhosa language is spoken in the Republic of South Africa and is written in the Latin alphabet.
visit download page

Zulu (lexicon size 370,000, selection May 2020)
The Zulu language is spoken in the Republic of South Africa and is written in the Latin alphabet.
visit download page

Arabic (العربية) (lexicon size ca. 5 million, selection October 2009)
The Arabic languages have its own script and the orthography is mainly based on consonantal roots. These roots are unfolded to millions of words.
visit download page

Azerbaijanian (lexicon size 132,000, selection May 2010)
Azerbaijanian is written in the Latin alphabet. It has much in common with Turkish.
visit download page

Hebrew (עִבְרִית) (lexicon size ca. 5.5 million, selection March 2008)
The Hebrew language is written in Hebrew characters, mainly consonants. The orthography is based on roots of 3 radicals, which unfold to millions of words.
visit download page

Persian/Farsi (فارسی)
(lexicon size 450,000, selection October 2009)
The Persian language is written in the Arabic script, but being an Indo-European language vowels are important.
visit download page

Urdu (اردو) (lexicon size 131,000, selection October 2009)
The Urdu language is closely related to Hindi, but written in the Arabic script. Urdu and Hindi are Indo-European languages.
visit download page

Breton (ar brezhoneg) (lexicon size 650,000, selection August 2016)
The Breton language is spoken in French Bretagne. It is a Celtic language once related to extincted Cornish in the UK.
visit download page

Thai (ภาษาไทย) (lexicon size 80,000, selection March 2008)
The Thai language is the official language of Thailand. Thai has its own script, a syllable script and most vowels are written above the consonants. Thai is a tone language and the tone marks are always written in top. The words of a sentence are written without spaces and therefore a sentences has to be segmented (hyphenated) prior to spell checking.
visit download page

Hindi (हिन्दी) (lexicon size 245,000, selection January 2018)
The Hindi language is spoken in northern and central India. Written Hindi is relatively standardized over the whole Hindi language area. It is an Indo-Aryan language. Althrough related to Urdu, Hindi does not favour the use of Persian and Arabic loanwords. Hindi is written in the Devanagari script, it includes a lot of complex characters, consisting of vowels, consonants, vowel-signs (matras), numerals, and diacritical marks.
visit download page

Marathi (मराठी) (lexicon size 268,500, selection January 2018)
The Marathi language is spoken in the Mahatashtra state of India. It is an Indo-Aryan language written in the Devanagari script.
visit download page

Nepalese (नेपाली) (lexicon size 130,000, selection September 2010)
The Nepalese language (Nepali) is spoken in the Himalayan state of Nepal between India and China. Nepalese is written in the Devanagari script.
visit download page

Kurdish (Northern) (lexicon size 90,000, selection July 2009)
belongs to the Iranian group of languages. Kurdish is spoken in Turkey, Iraq, Iran, Armenia, Georgia and Azerbaijan. The latin script is used for the Northern variety of Kurdish.
visit download page

Malayalam (മലയാളം) (lexicon size 477,000, selection January 2018)
The Malayalam language is spoken in Kerala, a state in the south of India. It is a Dravidian language written in the Malayalam script, a descendant of the Brahmi script. The Malayalam module supports chillu letters.
visit download page

Bengali (বাংলা) (lexicon size 566,000, selection January 2018)
The Bengali language is spoken in Bangladesh. It is a Indo-Aryan language written in the Bengali script, a descendant of the Brahmi script.
visit download page

Gujarati (ગુજરાતી) (lexicon size 189,000, selection January 2018)
The Gujarati language is spoken in the Indian state of Gujarat. It is a Indo-Aryan language written in the Gujarati script, a descendant of the Brahmi script.
visit download page

Tamil (தமிழ) (lexicon size 162,000, selection January 2018)
The Tamil language is spoken in southern India (Tamil Nadu) and Sri Lanka. It is a Dravidian language written in the Tamil script, a descendant of the Brahmi script. Tamil has many Indo-Aryan loanwords. Tamil in Sri Lanka incorporates loadwords from the Dutch, Portuguese, and English language.
visit download page

Sinhala (සිංහල) (lexicon size 208,000, selection November 2009)
The Sinhala language is spoken in Sri Lanka India. It is an Indo-Aryan branch of the Indo-European languages written in the Sinhala script, a descendant of the Indian Brahmi script. There is some affinity to neighbouring languages. Sinhala has features that may be traced to Dravidian influences.
visit download page

Punjabi (ਪੰਜਾਬੀ) (lexicon size 94,000, selection January 2018)
The Punjabi language is spoken in Punjab state of India. It is an Indo-Aryan branch of the Indo-European languages written in the Gurmukhi script, a descendant of the Indian Brahmi script.
visit download page

Telugu (తెలుగు) (lexicon size 240,000, selection January 2018)
The Telugu language is spoken in Andhra Pradesh, one of the largest states of India. It is a Dravidian language written in the Telugu script, a descendant of the Indian Brahmi script.
visit download page

Oriya (Odia) (lexicon size 331,000, selection January 2018)
The Oriya or Odia language is spoken in Odisha state of India. It is an Indo-Aryan branch of the Indo-European languages written in the Kalinga script, a descendant of the Indian Brahmi script.
visit download page

Khmer (ភាសាខ្មែរ) (lexicon size 30,000, selection November 2009)
The Khmer language is spoken in Cambodia. It is the second most widely spoken Austroasiatic language. As in Thai Khmer sentences are written without spaces. Therefore spell checking strongly depends on segmentation (see Hyphenator languages).
visit download page

Kazakh (Cyrillic/Latin) (lexicon size 900,000, selection May 2010)
The Kazakh language is spoken east of the Caspian Sea. It is a Turkic language related to Azerbaijan and Turkish. Kazakh is mainly written in the Cyrillic alphabet in Kazakhstan but a transition to the Latin script has already been brought up by the President of Kazakhstan in 2006. For this reason both Cyrillic and Latin lexicons have been compiled.
visit download page

Friulian (Furlan) (lexicon size 450,000, selection March 2012)
The Friulian language is spoken in the north east region of Italia. It is Italy's largest regional language.
visit download page

Luxemburgish (Lëtzebuergesch) (lexicon size 200,000, selection December 2012)
The Lëtzebuergesch language is spoken in the Grand Duchy of Luxembourg. The language/dialect descents from Mosel-Frankish, a dialect, linguistically close to High German and Limburgish. The population of Luxembourg is half a million only.
visit download page