RE: Glaring mistake in the code list for South Asian Script

From: Doug Ewell <>
Date: Thu, 08 Sep 2011 09:50:47 -0700

There is a block labeled "Bengali" which contains characters used for
writing several languages, including Bengali, Assamese, and perhaps

"Bengali" as the name of this block is just a name. It does not affect
the usability of the block for writing languages other than Bengali, and
does not make any statement about the suitability of any given character
for writing any given language. There are, however, stability policies
that prevent Unicode from changing the name.

The relationships between languages and scripts, between names of
languages and names of scripts, and between the set of characters used
for writing a single language in a script and the set used for writing
any language in that script, are usually not 1-to-1, and Unicode does
not ever assume they are 1-to-1.

Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 | | @DougEwell ­
> -------- Original Message --------
> Subject: Glaring mistake in the code list for South Asian Script
> From: delex r <>
> Date: Thu, September 08, 2011 5:24 am
> To:
> Here I would like to point out an absurdity in the code lists that where “Bengali” has been recorded as a South Asian script which is actually a misnomer. In fact Bengali is a language that uses a script that has not yet been named. The proposed name “Purvanagari” is an invention only as there is no consensus regarding accepting that term. The script used for writing Bengali language is also used for writing Assamese language or you can say vice-versa. But naming it as “Bengali” script undermines the separate existence of Standard Assamese Alphabet List  that contains special characters unique to Assamese language only. Example, the character allotted with Hex code 09F0 is Raw  {Pronounced as English ‘raw’(=uncooked)}       and  the character allotted with Hex code 09F1 is Wabo { Pronounced almost same as you would  pronounce ‘Wabo’ in English}. There is nothing like 09F0 or 09F1 in Bengali language. These two characters are part of Assamese Alphabet list!
>   and are used for spelling regular Assamese words. Terming these letters as Bengali specific addition is just like copy right violation, plagiarism, or more rudely call it a  theft. One can never find these letters in any Bengali writings, neither in ancient nor in modern.  Unicode must consider removing these letters from what it calls the Bengali script and recognise the fact that the non-presence of the above letters in that script makes it a phonetically deficient one. I would rather suggest that the code range 0980 to 09FF should be named as “Assamese” script because then only one can definitely include 09F0 and 09F1 as part of this range. In that case the only errata will be at the character with Hex code 09B0 which is described as Bengali letter Ra (phonetically same as 09F0 above) and can be resolved by aptly calling it a Bengali specific addition. The use of character 09F0 is older than 09B0 to spell words that requires bringing out the ‘Raw’ sound. For som!
>  e time in the 18th century the character 09B0 was used for writing Assamese language as evident from some print materials at that time but later the use of that character 09B0 was abandoned in Assamese. This suggest that that the character 09B0 was  for some time part of the Assamese script and should have no objection by any one now to be part of the “Assamese” script which I suggest for Unicode. One point worth mentioned here that every word and sound pronounced in Bengali language can be inscribed with the available standard character set with Assamese language whereas with phonetically deficient Bengali script one shall not be able to inscribe thousands of Assamese words. This point alone can put the name  “Assamese” script in precedence over the  “Bengali” script  for the purpose of nomenclature of the South Asian scripts.
Received on Thu Sep 08 2011 - 11:54:18 CDT

This archive was generated by hypermail 2.2.0 : Thu Sep 08 2011 - 11:54:19 CDT