I know, I know: the "top 100" languages list is utter non-sense and surely
does not fit the public relation needs of The Unicode Consortium.
However, as some people took the time to send me corrections and advice, I
tried to integrate them in the list, just for our amusement.
* John Cowan > "Azerbaijan has switched to Latin."
[I moved it]
* Joerg Knappen > Sunda uses Latin; Oromo uses Ethiopic.
[I moved them]
* Roozbeh Pournader > "Sindhi is written in Arabic script."
[I moved it]
* Thomas Chan > "... Other than Mandarin Chinese and Yue Chinese, the
other "Chinese" ones don't really have developed writing traditions, so the
question is sort of academic..."
* John Cowan and I > similar concern for Italian dialects.
[I collapsed most "dialects" under the entry of the "national language"
spoken in the area, assuming that speakers of these languages would use the
"national language" in writing (especially on computers)]
* Kent Karsson > "... That does not even cover all of the official
languages of the EU! So that "top 100" statement would be highly
[EU languages are not more important than others; moreover many other
languages are missing. Some of these languages (e.g. Hebrew) are relevant
for Unicode because they use a special script, or are tricky, or are "often
used" on computers, so I sort of added them without estimates]
* Janko Stamenovic > split Serbo-Croatian in Serbian (rough estimate:
8..10 millions) and Croatian.
[The divorce is done: 10 millions to Serbian and the rest to Croatian]
Here are the revised statement and the new list (ordered by writing systems;
the numbers show an estimate of the people speaking each language, in
"Unicode supports the top 100 languages. Unicode also supports all the
official languages used in the EU and many other languages, some of which
require unique writing systems."
*** Latinate alphabet
18 MALAY (also written in Arabic)
8 NIGERIAN FULFULDE
7 HAITIAN CREOLE FRENCH
(all other official languages in the EU)
*** Greek alphabet
*** Cyrillic alphabet
18 NORTHERN UZBEK
10 SERBIAN (also written in Latinate)
*** Armenian alphabet
*** Hebrew alphabet
*** Arabic alphabet
175 ARABIC (all dialects)
30 WESTERN PANJABI
*** Thaana alphabet
*** Devanagari alphabet
*** Bengali alphabet
*** Gujarati alphabet
*** Gurmukhi alphabet
26 EASTERN PANJABI
*** Oriya alphabet
*** Tamil alphabet
*** Telugu alphabet
*** Kannada alphabet
*** Malayalam alphabet
*** Sinhala alphabet
*** Thai alphabet
*** Lao alphabet
*** Myanmar alphabet
*** Georgian alphabet
*** Hangul script
75 KOREAN (also uses CJK ideographs, a.k.a. hanja)
*** Ethiopic script
*** Cherokee script
*** Canadian syllabic script
*** Khmer alphabet
*** Mongolian alphabet
*** Braille patterns
(many languages worldwide)
*** Kana script
125 JAPANESE (also uses CJK ideographs, a.k.a. kanji)
*** CJK ideographs (a.k.a. hanzi, kanji, hanja)
885 MANDARIN CHINESE
66 YUE CHINESE
282 (other Chinese dialects)
*** Yi script
*** Unknown (unwritten?)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT