Continue:Glaring mistake in the code list for South Asian Script from delex r on 2011-09-09 (Unicode Mail List Archive)

From: delex r <delexr_at_indiatimes.com>
Date: Sat, 10 Sep 2011 04:23:57 +0530 (IST)

I figure out that Unicode has not addressed the sovereignty issues of a language while trying to devise an ASCII like encoding system for almost all the characters and symbols used on earth. I am continuing with my observation of the glaring mistake done by Unicode by naming a South Asian Script as “Bengali”. Here I would like to give certain information that I think will be of some help for Unicode in its endeavour to faithfully represent a Universal Character encoding standard truer to even micro-facts.

India is believed to have at least 1652 mother tongues out of which only 22 are recognized by the Indian Constitution as official languages for administrative communication among local governments and to the citizens. And the constitution has not explicitly recognized any official script. As Unicode has listed the languages and scripts, the Indian Constitution has also listed the official languages ( In its 8th schedule). The first entry in that list is the Assamese language. Assamese is a sovereign language with its own grammar and “script” that contains some unique characters that you will not find in any of the scripts so far discovered by Unicode. At least 30 million people call it the “Assamese Script” and if provided with computers and internet connection can bomb the Unicode e-mail address with confirmations. These characters are, I repeat, the one that is given a Hexcode 09F0 and the other with 09F1 by this universal character encoding system but unfortunately has described both as “Benga
li” Ra etc. etc. I don’t know who has advised Unicode to use the tag “Bengali” to name the block that includes these two characters.

If you are not an Indian then just google an image of an Indian Currency note. There on one side of the note you will find a box inside which the value of the currency note is written in words in at least 15 scripts of official Indian languages.( I don’t know why it is not 22). At the top , the script is Assamese as Assamese is the first officially recognized language (script?) . Next below it you will find almost similar shapes. That is in Bengali. India officially recognises the distinction between these two scripts which although shaped similar but sounds very different at many points. And the standard assamese alphabet set has extra characters which are never bengali just like London is never in Germany.

Coming again to the Hexcodes 09F0 (Raw) and 09F1 (wabo). Both have nothing Bengali in them and interestingly 09F1 ( sounds WO or WA when used within words) has even nothing ‘Ra’ sound in it. Thus you know, with actual Bengali alphabet set one can’t write anything to produce the sound “Watt” as in James Watt and instead need to combine three alphabets but even then only to sound like “ OOYAT “ in Bengali itself.

Therefore Unicode must consider terming the block range as “Assamese” which will faithfully describe the block range with 09F0 and 09F1 in it and replace all tags “ Bengali” with “Assamese” in the code descriptions and vice versa . London is in England and Berlin is in Germany. You just can’t bring London into Germany and then say England is in Germany. You can’t live with a lie or wrong too long.
Received on Fri Sep 09 2011 - 18:14:49 CDT

This archive was generated by hypermail 2.2.0 : Fri Sep 09 2011 - 18:14:51 CDT