ASSAMESE AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS

From: Satyakam Phukan <sphukan2011_at_yahoo.co.uk>
Date: Sat, 7 Jul 2012 20:39:34 +0100 (BST)

I am forwarding this article of mine published in my blog/website and with title and other alterations in on-line journal Times of Assam. I had also written a detailed report on the issue forwarded to all concerned including the Unicode Consortium. I hope solution comes through the co-operation of all involved in the issue. ASSAMESE AND BENGALI CONTROVERSY IN UNICODE STANDARD ::::: SOLUTIONS The Unicode Consortium, a non-Governmental body with headquarters in the U.S.A with Governmental agencies of many countries also as members , have standardised and maintains a Universal Character Set (UCS), i.e. a standard that defines, in one place, all the characters needed for writing the majority of living languages in use on computers. It aims to be, and to a large extent already is, a superset of all other character sets that have been encoded. Unicode (as the UCS is commonly referred to) can access over a million characters of which about 100,000 have already been defined. These include characters for all the world's main languages along with a selection of symbols for various purposes. REASONS OF DISSENSIONS AMONG THE ASSAMESE : 1. Non-representation/misrepresentationof the Assamese writing systemin the Unicode Standardbecause the Unicode Consortiumand also the Government of India thinksthat the current Bengali Code chartwill servethe purposeof usingthe Assamese language in computers. 2. The script isnamed as Bengaliand all character descriptorsin the Unicode Code Chart named as per the Bengali nomenclature andAssamese areforced touse it, neither theGovernment of India and theUnicode Consortium iswilling todo anything positiveon it. Both take itas a political issueand cite multiple technical difficultiesin solving it, and try to convince the complainants that nothing is wrong with it. 3. But the fact remainsthat the Assamese alphabet"ৰ"(Ro) is beingdescribed as Bengali letter"র"(Ro)with middle diagonal, in the Bengali chart of the Unicode Standard. 4. Assamese alphabet"ৱ"(Wobo)described as Bengali letter"র"(Ro)with lower diagonal, in the Bengali chart of the Unicode Standard. 5. Thirteen other Assamese alphabetssimilarly misrepresentedin the Bengali chart of the Unicode Standard. 6. Assamese alphabet"ক্ষ"(Khya) is not representedat all in the Bengali Code Chart of the Unicode. 7. This results in gross Collation Errorwhich occurs when sorting softwaresare run in Assameseas because "ৰ"(Ro) and "ৱ"(Wobo) are not in proper placeand "ক্ষ" (Khya) is not representedat all in the BengaliCode Chartof the Unicode Standard. SOLUTIONS UNDER CONSIDERATON : 1.RENAMING OF THE SCRIPT AND ALTERNATIVE NOMENCLATURE OF THE CHARACTER DESCRIPTORS This is statedin the beginningbecause, the Government of Indiaseems more interestedin solving it that way. Renamingof the current Bengali script in the Unicode Standardwith a name acceptableto allhas been proposed by many. The problemwith the renamingsolution is there, bothin the Bengaliand Assamese sideand most important a technical problemis associated with it. A. Will the Bengali community agree to it, considering that the present Bengali code chartis servingtheir purposequite well. The Bengali community is there in two sovereign countries Indiaand Bangladesh. B. The major problem lies on the Assamese side, will the renamingbe limitedto the renaming of the name of the Script and Code chartonly or will it includethe misrepresented character descriptors' nomenclaturealso. For example the following Assamese characters have Bengali descriptors,different from how they would have been described in Assamese. Unicode code point character UTF-8 (hex.) UNICODE NAME BENGALINAME ASSAMESE IPA ASSAMESE U+099A চ e0 a6 9a BENGALI LETTER CA ASSAMESE LETTER SA (PRATHAM) s U+099B ছ e0 a6 9b BENGALI LETTER CHA ASSAMESE LETTER SA (DWITIYA) s U+099F ট e0 a6 9f BENGALI LETTER TTA ASSAMESE LETTER TA (MURDHENYA) t U+09A0 ঠ e0 a6 a0 BENGALI LETTER TTHA ASSAMESE LETTER THA (MURDHENYA) th U+09A1 ড e0 a6 a1 BENGALI LETTER DDA ASSAMESE LETTER DA (MURDHENYA) d U+09A2 ঢ e0 a6 a2 BENGALI LETTER DDHA ASSAMESE LETTER DHA (MURDHENYA) dh U+09A3 ণ e0 a6 a3 BENGALI LETTER NNA ASSAMESE LETTER NA (MURDHENYA) n U+09AF য e0 a6 af BENGALI LETTER YA ASSAMESE LETTER ZA (ANTUSTYA) z U+09B6 শ e0 a6 b6 BENGALI LETTER SHA ASSAMESE LETTER XA (TALOBYA) x U+09B7 ষ e0 a6 b7 BENGALI LETTER SSA ASSAMESE LETTER XA (MURDHENYA) x U+09B8 স e0 a6 b8 BENGALI LETTER SA ASSAMESE LETTER XA (DONTIYA) x U+09C0 ী e0 a7 80 BENGALI VOWEL SIGN II ASSAMESE VOWEL SIGN I (DIRGHA) i U+09C2 ূ e0 a7 82 BENGALI VOWEL SIGN UU ASSAMESE VOWEL SIGN U (DIRGHA) u U+09CD ্ e0 a7 8d BENGALI SIGN VIRAMA ASSAMESE SIGN REF U+09CE ৎ e0 a7 8e BENGALI LETTER KHANDA TA ASSAMESE LETTER HASANTA TA t U+09D7 ৗ e0 a7 97 BENGALI AU LENGTH MARK ASSAMESE VOWEL SIGN AU (TIBETO-BURMAN) U+09DC ড় e0 a7 9c BENGALI LETTER RRA ASSAMESE LETTER RA (DORE) U+09DF য় e0 a7 9f BENGALI LETTER YYA ASSAMESE LETTER YA iɒ U+09F0 ৰ e0 a7 b0 BENGALI LETTER RO WITH MIDDLE DIAGONAL ASSAMESE LETTER RA r U+09F1 ৱ e0 a7 b1 BENGALI LETTER RO WITH LOWER DIAGONAL ASSAMESE LETTER WA w U+09FA ৺ e0 a7 ba BENGALI ISSHAR ASSAMESE SIGN SWARGIO (LATE/HEAVENLY) Supposingrenamingis taken up as the best solution for solving the controversy then the whole current Bengali Code Chartof the Unicode Standardwill have to have alternative nomenclature beginning with the titleof the script like ASSAMESE AND BENGALIand theindividual characterswill also have alternative character descriptorslike this : U+09B8 "স" e0 a6 b8 =BENGALI LETTER SA / ASSAMESE LETTER XA (DONTIYA) U+09AF "য" e0 a6 af =BENGALI LETTER YA / ASSAMESE LETTER ZA (ANTUSTYA) If such an alterationis possibleand every characteris givenboththe Assameseand Bengali descriptorsand the script renamedas per an acceptable nameand the displacedand missing Assamese characters"ৰ"(Ro)and "ৱ"(Wobo)and "ক্ষ" (Khya)putin proper placein the chart,for proper collationthe problem may be solved. But as per the basic principleof a Unique Code, one particular entity can have one identifier, in this case around fifteen characterswill have one identifier for two entities. If Unicode Consortium or the Indian Government thinks that this basic principle of Unique Codification can be violated then the matter may be acceptable to the Assamese and Bengali alike. 2. SEPARATE SLOT/RANGE FOR THE ASSAMESE SCRIPT If renaming in the way described above is not possible, then allocation of a separate slot/range for the Assamese Script remains the only solution. Whichis perhaps easier for the Unicode Consortiumto do. Government of Assamhas also movedthe Government of Indiaseeking a separate slot/rangefor the Assamese script. Allocation of a separate slot/range for the Assamese Scriptwill mean Unicode Consortiumallowing and acceptingduplication of characters. The Unicode Consortiumhas already allowedand acceptednot only duplicationbut in case of some of the characters triplicationof charactersin the three major European writing systemsviz. Cyrillic, Greekand Latin. Consequently in the Unicode Standard has more than the following number of duplicate characters : a=2, A=3, B=3, c=2, C=2, e=2, E=3, H=3, i=2, I=3, j=2, J=2, K=2, M=3, N=2, o=2, O=3, p=2, P=3, s=2, S=2, T=2, x=2, X=3, y=2, Y=2 and Z=2 Here only there are a total of 63 (sixty three characters) duplicatedbetween the three major European writing systems the Cyrillic, Greekand Latin, theactual number is more than this. Number wise duplication of characters will be perhaps much less than this, if Bengali and Assamese scripts are duplicated and allocated separate slots/ range for themselves. CONCLUSION : The solution therefore lies in duplicity. In the first option there is going to be duplicity of the Unique Codes meaning single code for two entities and in the second option there is going to be duplicity of characters meaning two characters of the same appearance. The Unicode Consortium and the Government of India has to choose between the two. Duplicity of characters is already there in the Unicode Standard but whether duplicity of Unique Codes are there, or whether it is acceptable to the experts, whether it is justified, it is not known, because duplicity itself means loss of uniqueness of any Unique Code. For full details on the issue go to this webpage http://drsatyakamphukan.wordpress.com/assamese-and-unicode Dr Satyakam Phukan General Surgeon Jorpukhuripar, Uzanbazar Guwahati, Assam P.I.N : 781001 Phone: 99540 46357 Website : http://drsatyakamphukan.wordpress.com
Received on Sat Jul 07 2012 - 14:43:29 CDT

This archive was generated by hypermail 2.2.0 : Sat Jul 07 2012 - 14:43:31 CDT