Re: ASSAMESE AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS from Michael Everson on 2012-07-08 (Unicode Mail List Archive)

From: Michael Everson <everson_at_evertype.com>
Date: Sun, 8 Jul 2012 10:16:59 +0100

On 7 Jul 2012, at 20:39, Satyakam Phukan wrote:

> I am forwarding this article of mine published in my blog/website and with title and other alterations in on-line journal Times of Assam. I had also written a detailed report on the issue forwarded to all concerned including the Unicode Consortium.

Very well. I am concerned. I am also someone who has worked for many years to support lesser-used languages and scripts. In particular I have helped to encoded many characters in use in India and neighbouring countries.

> I hope solution comes through the co-operation of all involved in the issue.

The solution is for you to realize that you are simply wrong.

> ASSAMESE AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS

There is no controversy. You are trying to make a controversy. But there is none.

> The Unicode Consortium, a non-Governmental body with headquarters in the U.S.A with Governmental agencies of many countries also as members, have standardised and maintains a Universal Character Set (UCS), i.e. a standard that defines, in one place, all the characters needed for writing the majority of living languages in use on computers.

The National Bodies who participate in ISO also maintain the same standard, through ISO/IEC JTC1/SC2.

> It aims to be, and to a large extent already is, a superset of all other character sets that have been encoded. Unicode (as the UCS is commonly referred to) can access over a million characters of which about 100,000 have already been defined. These include characters for all the world's main languages along with a selection of symbols for various purposes.
>
> REASONS OF DISSENSIONS AMONG THE ASSAMESE :

Have you been elected or appointed to represent the Assamese? Do you, in fact, represent them?

> 1. Non-representation/misrepresentation of the Assamese writing system in the Unicode Standard because the Unicode Consortium and also the Government of India thinks that the current Bengali Code chart will serve the purpose of using the Assamese language in computers.

We think this because it is true. The set of characters encoded in the block called "Bengali" are the same characters which are used in Assamese.

> 2. The script is named as Bengali and all character descriptors in the Unicode Code Chart named as per the Bengali nomenclature and Assamese are forced to use it, neither the Government of India and the Unicode Consortium is willing to do anything positive on it. Both take it as a political issue and cite multiple technical difficulties in solving it, and try to convince the complainants that nothing is wrong with it.

The Latin script is named as Latin, and Germans are forced to use it, and the Irish are forced to use it, and even in India where English is one of the official languages, the Bengalis and the Assamese are forced to use it.

You used the Latin script in your e-mail. But you were writing in English, not in Latin. Why are you not coming out shouting about THE ENGLISH AND LATIN CONTROVERSY IN THE UNICODE STANDARD?

> 3. But the fact remains that the Assamese alphabet "ৰ" (Ro) is being described as Bengali letter "র"(Ro) with middle diagonal, in the Bengali chart of the Unicode Standard.

You must learn something. This isn't optional. There is something you have not learned, and you MUST learn it. Character names are descriptive. Character names are not encyclopaedic. LATIN SMALL LETTER HV could also be described as GOTHIC TRANSLITERATION H V LIGATURE. Or it could be named LATIN SMALL LETTER HWAIR. But it doesn't matter. ARABIC LETTER PEH could be named PERSIAN LETTER PEH. Or it could be named URDU LETTER PEH. But it has not been named that way, because of the history of the encoding and because, essentially, the script was used for Arabic and was later exported elsewhere. But you don't see Iranians and Pakistanis coming to this list and SHOUTING THAT THERE IS A CONTROVERSY, do you?

BENGALI LETTER RA WITH MIDDLE DIAGONAL could be named ASSAMESE LETTER RO. But it hasn't been, because Bengali is spoken by 230 million speakers, and Assamese is spoken by 13 million. Moreover, the script was encoded about two decades ago, because it had been brought in because of its standardization in ISCII.

Do you really think it is unfair that, 230 million speakers vs 13 million speakers, the name Bengali has been preferred? Well, tough. Grow up. YOU DON'T KNOW HOW LUCKY YOU ARE to have your script already encoded.

I know people with disorders like cerebral palsy who speak using Blissymbols. Blissymbols is the ONLY language these people are able to communicate with. Has Blissymbols been encoded? Not yet. Why not? because the users aren't numerous and because someone like me hasn't been able to secure funding to do the work to get the script encoded.

Do you think it is fair for you to be yelling and screaming about something COSMETIC ("a rose by any other name would smell as sweet") like BENGALI LETTER RA WITH MIDDLE DIAGONAL and ASSAMESE LETTER RO? Don't you think you are wasting everybody's time?

Why don't you do something useful, like write a children's book in Assamese and get it published? That will support your language in a real and meaningful way. You can use the Unicode Standard to write your story.

> 4. Assamese alphabet "ৱ" (Wobo) described as Bengali letter "র"(Ro) with lower diagonal, in the Bengali chart of the Unicode Standard.

Yes, that's right. The code chart named Bengali contains characters used in the Bengali language and characters used in the Assamese language.

> 5. Thirteen other Assamese alphabets similarly misrepresented in the Bengali chart of the Unicode Standard.

The correct word is "letters", not "alphabets". An alphabet (varṇamālā) is a collection of letters (akṣara).

And they are not misrepresented. When I write Irish, I do not use IRISH LETTER A and IRISH LETTER B and IRISH LETTER C. And when I write English do not use ENGLISH LETTER A and ENGLISH LETTER B and ENGLISH LETTER C. All of these are named LATIN LETTER. Similarly, when you wish to write the letter you call WOBO, you use the coded character which happens to be named BENGALI LETTER RA WITH LOWER DIAGONAL.

> 6. Assamese alphabet "ক্ষ" (Khya) is not represented at all in the Bengali Code Chart of the Unicode.

This letter (not "alphabet") is represented by a sequence of characters.

> 7. This results in gross Collation Error which occurs when sorting softwares are run in Assamese as because "ৰ" (Ro) and "ৱ" (Wobo) are not in proper place and "ক্ষ" (Khya) is not represented at all in the Bengali Code Chart of the Unicode Standard.

Collation is important, but it is not handled by the code table. Another standard handles collation (The Unicode Collation Algorithm, ISO/IEC 14651) and your requirements can be met there.

> SOLUTIONS UNDER CONSIDERATON :
>
> 1.RENAMING OF THE SCRIPT AND ALTERNATIVE NOMENCLATURE OF THE CHARACTER DESCRIPTORS

I am going to say this ten times, so that you understand it: The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed.

Do you understand? They cannot be changed. It is impossible to change them.

> This is stated in the beginning because, the Government of India seems more interested in solving it that way. Renaming of the current Bengali script in the Unicode Standard with a name acceptable to all has been proposed by many. The problem with the renaming solution is there, both in the Bengali and Assamese side and most important a technical problem is associated with it.

There are no technical problems with the character encoded at U+09F0. The only problem is that you like to call this WOBO or ASSAMESE LETTER RA and the standardized (and unchangeable) character name is BENGALI LETTER RA WITH MIDDLE DIAGONAL. This is not a technical problem. This is a matter of cosmetics. It might be a matter of national pride, but it is trivial and unimportant. Since the character names cannot ever be changed, your campaign cannot succeed.

Please find another way to defend and support your language and culture.

> A. Will the Bengali community agree to it, considering that the present Bengali code chart is serving their purpose quite well. The Bengali community is there in two sovereign countries India and Bangladesh.

It doesn't matter. The character names cannot be changed.

> B. The major problem lies on the Assamese side, will the renaming be limited to the renaming of the name of the Script and Code chart only or will it include the misrepresented character descriptors' nomenclature also. For example the following Assamese characters have Bengali descriptors, different from how they would have been described in Assamese.

It doesn't matter. The character names cannot be changed.

> Supposing renaming is taken up as the best solution for solving the controversy

Renaming will not happen. It cannot be taken up. Moreover, there is no controversy, except the one that you want to create. If there are other people in Assam who are also complaining about this, you CAN help them. Here is how:

Read what I have written above and learn from it. Then go teach your colleagues in Assam.

> then the whole current Bengali Code Chart of the Unicode Standard will have to have alternative nomenclature beginning with the title of the script like ASSAMESE AND BENGALI and the individual characters will also have alternative character descriptors like this :
>
> U+09B8 "স" e0 a6 b8 = BENGALI LETTER SA / ASSAMESE LETTER XA (DONTIYA)
>
> U+09AF "য" e0 a6 af = BENGALI LETTER YA / ASSAMESE LETTER ZA (ANTUSTYA)

Character names cannot be changed.

> If such an alteration is possible

Character names cannot be changed.

> and every character is given both the Assamese and Bengali descriptors and the script renamed as per an acceptable name and the displaced and missing Assamese characters "ৰ" (Ro) and "ৱ" (Wobo) and "ক্ষ" (Khya) put in proper place in the chart, for proper collation the problem may be solved.

Collation is not handled by the code chart. It is handled by a different standard. We *can* support Assamese collation. We *cannot* change character names.

> But as per the basic principle of a Unique Code, one particular entity can have one identifier, in this case around fifteen characters will have one identifier for two entities.

No. You are asking for a character name change for 15 characters, and we *cannot* change character names.

> If Unicode Consortium or the Indian Government thinks that this basic principle of Unique Codification can be violated then the matter may be acceptable to the Assamese and Bengali alike.

The only unique identifier you need is U+09F0 and the other code positions your language uses.

> 2. SEPARATE SLOT/RANGE FOR THE ASSAMESE SCRIPT

This will NEVER HAPPEN because the Bengali and Assamese use the SAME SCRIPT. It may be true that 13 million people call some of the letters by one name and 230 million people call some of the letters by a different name, but it is still the SAME SCRIPT.

> If renaming in the way described above is not possible, then allocation of a separate slot/range for the Assamese Script remains the only solution.

No, because this would mean that you would never know what script an Assamese text was written in, because the scripts are IDENTICAL. This would cause a security problem that would affect 243 million people.

> Which is perhaps easier for the Unicode Consortium to do. Government of Assam has also moved the Government of India seeking a separate slot/range for the Assamese script. Allocation of a separate slot/range for the Assamese Script will mean Unicode Consortium allowing and accepting duplication of characters. The Unicode Consortium has already allowed and accepted not only duplication but in case of some of the characters triplication of characters in the three major European writing systems viz. Cyrillic, Greek and Latin.

This analogy is false. Only *one* letter has the same shape in all three of those scripts, O o. And those letters are found also in the Deseret script.

> Consequently in the Unicode Standard has more than the following number of duplicate characters :
>
> a=2, A=3, B=3, c=2, C=2, e=2, E=3, H=3, i=2, I=3, j=2, J=2, K=2, M=3, N=2, o=2, O=3, p=2, P=3, s=2, S=2, T=2, x=2, X=3, y=2, Y=2 and Z=2
>
> Here only there are a total of 63 (sixty three characters) duplicated between the three major European writing systems the Cyrillic, Greek and Latin, the actual number is more than this.
> Number wise duplication of characters will be perhaps much less than this, if Bengali and Assamese scripts are duplicated and allocated separate slots/ range for themselves.
>
> CONCLUSION :
>
> The solution therefore lies in duplicity.

And that would cause chaos and confusion and internet theft on a massive scale. It would be the greatest disservice we could do to the people of Assam. It would be monstrously irresponsible.

> In the first option there is going to be duplicity of the Unique Codes meaning single code for two entities and in the second option there is going to be duplicity of characters meaning two characters of the same appearance.

You don't want that.

> The Unicode Consortium and the Government of India has to choose between the two. Duplicity of characters is already there in the Unicode Standard but whether duplicity of Unique Codes are there, or whether it is acceptable to the experts, whether it is justified, it is not known, because duplicity itself means loss of uniqueness of any Unique Code.

No. The solution is for you to learn why you have been wrong. Your thinking and your logic is wrong. The name of the script and the characters *isn't* meaningful.

> For full details on the issue go to this webpage
> http://drsatyakamphukan.wordpress.com/assamese-and-unicode

You're wasting your time and energy. You seem to have a lot of energy: Please, please -- find another way to serve the Assamese language.

> Dr Satyakam Phukan
> General Surgeon

Michael Everson * http://www.evertype.com/formal.html
Expert in the world's writing systems
Received on Sun Jul 08 2012 - 04:20:39 CDT

This archive was generated by hypermail 2.2.0 : Sun Jul 08 2012 - 04:20:40 CDT