RE : ASSAMESE AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS from Satyakam Phukan on 2012-07-10 (Unicode Mail List Archive)

From: Satyakam Phukan <sphukan2011_at_yahoo.co.uk>
Date: Tue, 10 Jul 2012 10:10:18 +0100 (BST)

Here are my replies to the relevant responses in response to my post on the subject "Assamese and Bengali controversy in Unicode ::: Solutions" *Mr Ewell 1. The names of characters do not cause any kind of technical problem in using them. Letters called “Latin” in Unicode are used to write hundreds of languages that are not Latin. Different languages sometimes call the same letter by different names, and this is also not a technical problem. *Mr Kolehmainen The various scripts to write the languages of Europe are indeed different scripts, some of which are used to write many different languages. *Mr Shoulson If that truly is the concern here, then surely English should feel at least as slighted. The word "ENGLISH" appears nowhere in the Unicode database as the description of any character. Nor does "ITALIAN", "DUTCH", or "FINNISH". "FRENCH" appears only in U+20A3 FRENCH FRANC SIGN (a currency symbol) and in U+1F35F FRENCH FRIES. Even "AMERICAN" shows up only in the emoji U+1F3C8 AMERICAN FOOTBALL. I think this demonstrates that having a name on a character in Unicode does not indicate anything about how literate a language is or should be perceived. Conversely, whatever script the Phaistos disc is written in has its entire known literature consisting of a single document, but it gets a whole section in the standard. *Mr Everson The Latin script is named as Latin, and Germans are forced to use it, and the Irish are forced to use it, and even in India where English is one of the official languages, the Bengalis and the Assamese are forced to use it. You used the Latin script in your e-mail. But you were writing in English, not in Latin. Why are you not coming out shouting about THE ENGLISH AND LATIN CONTROVERSY IN THE UNICODE STANDARD? BENGALI LETTER RA WITH MIDDLE DIAGONAL could be named ASSAMESE LETTER RO. But it hasn't been, because Bengali is spoken by 230 million speakers, and Assamese is spoken by 13 million. Moreover, the script was encoded about two decades ago, because it had been brought in because of its standardization in ISCII. Do you really think it is unfair that, 230 million speakers vs 13 million speakers, the name Bengali has been preferred? Well, tough. Grow up. YOU DON'T KNOW HOW LUCKY YOU ARE to have your script already encoded. Reply : The answer to these responses is exactly and accurately provided by Mr Everson. I am telling about the "THE ENGLISH AND LATIN CONTROVERSY IN THE UNICODE STANDARD". The Latin script developed in ancient Roman civilisation and two nationalities are inheritors of the Roman heritage the Italians and the Romanians. The number of English speakers using the Latin script is far more than the Italians and the Romanians put together. How will it be if the Latin script is called the English script as is called so, by many ignorant people in the third world countries. This has exactly happened when the script that historically developed in ancient Assam then called Kamrup is internationally named as Bengali. Bengali have got it from the ancient Assamese and used it by adapting to their usage system because Assamese use it in a different way. Worth mentioning that Bengali may be considered a Sanskrit origin language but Assamese is not. In the process they have omitted one important letter making it phonetically incomplete. It is right for any responsible international organisation be it Unicode or ISO to misrepresent something on the ground that one community is larger and more influential than the other ? I have explained this truth in my report sent to the Unicode Consortium in November last year, it can be found here and also here. The contents of the statements of Mr Michael Everson discriminating a smaller linguistic group in favour of a larger one, are in clear violation of the provisions enshrined in the UNIVERSAL DECLARATION ON LINGUISTIC RIGHTS, links to which are provided by Mr Everson himself in his personal website. *Mr Everson I'd like to say one more thing about this waste of time. > Dr Satyakam Phukan > General Surgeon > Jorpukhuripar, Uzanbazar > Guwahati, Assam Dr Phukan is clearly making a lot of noise on his own behalf. I do not believe he speaks for most Assamese. In fact, here is what I believe: The Assamese are already using Unicode and printing newspapers and magazines and books and posters and all sorts of things and are not worried about this cosmetic issue. And they have been doing so for years. Reply : Mr Everson is grossly misinformed about the status of this issue among the Assamese people. Not only conscious public even the Government of the state of Assam is seized of this issue. In February this year the Government of Assam has requested the Government of India to move the Unicode Consortium for obtaining a separate slot/range/block for the Assamese script. You can find the official communication here. Along with that a proposed Code Chart for the purpose has been prepared and sent along with that, you can find the proposed Code Chart here. *Mr Ewell 2. Latin, Greek, and Cyrillic are different scripts, not just different alphabets within the same script, and the analogy with Bengali/Assamese is inappropriate. See Technical Note #26 for more information. Reply : I have read the Technical Note #26, and have come to know why they are not unified. I am highlighting the presence of large number of duplicate characters between these interrelated script. Although you are not accepting the Assamese and Bengali as two different scripts, they are so and I have described in detail in my report to the Unicode Consortium sent in November last year, it can be found here and also here. *Mr Everson This analogy (Greek-Latin-Cyrillic duplication) is false. Only *one* letter has the same shape in all three of those scripts, O o. And those letters are found also in the Deseret script. And that would cause chaos and confusion and internet theft on a massive scale. It would be the greatest disservice we could do to the people of Assam. It would be monstrously irresponsible. Reply : Mr Everson's assertion is totally false, there is massive duplication of characters between the Greek, Latin and Cyrillic scripts. Deseret script is used by polygamous Mormon sect of Utah, I do not find any relevance to the issue in question. To the see the extent of duplication of characters between the Greek, Latin and Cyrillic scripts see this Chart. The presence of duplication of characters between the Greek, Latin and Cyrillic scripts is utilised by unscrupulous elements to indulge phising and other nefarious activities. If the views expressed above are to be followed it would mean that : Allowing duplication between Greek, Latin and Cyrillic scripts is a great and responsible service to the entire humanity but allowing duplication between Assamese and Bengali "would be the greatest disservice we could do to the people of Assam. It would be monstrously irresponsible." *Mr Ewell 3. The order of characters in a code chart does not cause any kind of collation problem, because binary code point order is never assumed to be correct for language-appropriate collation. *Mr Everson Collation is important, but it is not handled by the code table. Another standard handles collation (The Unicode Collation Algorithm, ISO/IEC 14651) and your requirements can be met there. *Mr Kolehmainen Your misunderstanding related to collation is even more surprising. The sequence of the code points is not the evident basis for the collation, nor does the default collation (as defined for the full UCS covering multilingual, multiscript texts by the Unicode Collation Algorithm, UCA, and the ISO/IEC 14651) apply as such to all languages written in the same script. The best examples of this are the wildly different collation sequences of the many languages written using the Latin script. The Unicode Common Locale Data Repository (CLDR) is an excellent vehicle to publish the proper collation sequence for any given language (and script) and region combination. Reply: I welcome your response in clearing up my misconception regarding collation, but collation error has been a problem with Assamese and is still persisting, why! the experts can answer better. There is a topic on this subject in the Unicode forum started by someone anonymous. One character of the Assamese alphabets is not there in the Unicode Code Chart, can it be a reason ? *Mr Everson The National Bodies who participate in ISO also maintain the same standard, through ISO/IEC JTC1/SC2. *Mr Kolehmainen You either ignore or are surprisingly unaware of the fact that the Unicode Standard is developed in co-operation with the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), specifically with their Joint Technical Committee 1 for Information Technology (JTC1), more specifically with its SC2 (Coded Character Sets) / WG2 (Universal Coded Character Set) that produces the ISO/IEC 10646. The Unicode Consortium is thus not at liberty to make changes to the standard on its own. Reply : Assamese is represented in the ISO by the codes "as" and "asm". Further information is provided below. *Mr Wordhingham Isn't the correct way of translating 'BENGALI' in Character names into Assamese to use the the word normally used to mean Assamese? What problems does this approach leave? Don't you think the Mons are offended by the Mon script being called the 'Myanmar' script? Reply : Translation and transliteration system is quite different between the Assamese and Bengali. Bengali follows the Sanskrit but Assamese is very different. Please see here and also here I have described in all details possible from my side. Since the basic difference the Bengali and Assamese has been ignored transliteration of Assamese as per ISO 5919 and the Unicode is totally erroneous please see these chart 1, chart 2 and chart 3 and compare with this chart which will show the actual transliteration of the Assamese. Case of Mon and the Burmese (Myanmarese) is different from that of Assamese and the Bengali. Mons are older inhabitants of Myanmar than the Burmese called Bamah. The script that all the communities of Myanmar uses is derived or copied from the southern Indian scripts. Similar scripts are used by Tai groups, many which are in Assam . The script encoded as Bengali in Unicode Standard originated in the ancient Kamrup but has been encoded in the named of the borrower the Bengali who are now more in numbers than the rightful inheritors. *Mr Everson I am going to say this ten times, so that you understand it: The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. The block name and character names cannot be changed. Reply : The attitude reflected in these continuous assertions reminds one of another controversy in the Unicode Standard relating to Khmer. The problem may be different, as problem of two groups cannot always be similar. But the attitude reflected is similar, reader can read this piece of writing by a Khmer speaking person describing their experience with Unicode, interesting and may give a lot of introspection to those involved. *Mr Everson Do you think it is fair for you to be yelling and screaming about something COSMETIC ("a rose by any other name would smell as sweet") like BENGALI LETTER RA WITH MIDDLE DIAGONAL and ASSAMESE LETTER RO? Don't you think you are wasting everybody's time? Reply : Experts can say how much of the problem cited by us is technical and how much cosmetic but for us the main grievance is mis-representation of facts which has resulted in incomplete and erroneous representation of our script. Another example is the description of the "Bengali" script as being similar to Devnagari in Unicode 6.1. Not a single character of Assamese and Devnagari are similar except the "i" sign. The similarity is actually with the Tibetan alphabets, both these scripts use angles more properly acute angles in the characters. No other script evolved in the Indian subcontinent uses forms with acute angles, this fact was informed to the Unicode in the report sent in November last year. The chart showing similarity between Assamese and Tibetan compared to Devnagari can be found here. Just as the UNIVERSAL DECLARATION ON LINGUISTIC RIGHTS says "All languages are equal" it must seen by all responsible international organisations taking up the task of codification of the worlds languages that no linguistic group is mis-represented, discriminated, neglected and deprived of the their rightful place on the ground of being lesser in numbers or less influential or because of being neglected by the central government of a large country where they may be in minority. Dr Satyakam Phukan
Received on Tue Jul 10 2012 - 04:14:18 CDT

This archive was generated by hypermail 2.2.0 : Tue Jul 10 2012 - 04:14:19 CDT