L2/08-207 Title: Towards property classification for Brahmi-derived scripts. Source: Ken Whistler Date: 2008-May-05 This document contains a first proposal for how to codify properties that would help in aksara determination and other processing for Brahmi-derived scripts. This is the follow-up on my L2 action item 110-A032 to "create new draft properties that support the determination of orthographic clusters in Indic scripts." Updated 2008-04-07 for Unicode 5.1. This first pass just uses Devanagari, and pulls out the classes worthy of note. See below for the details. My take on properties for this would be to create a major enumerated type for the structural parts -- something along the lines of: Indic_Syllabic_Category = [vowel, consonant, matra, virama, bindi, visarga, nukta, avagraha, other] The value isc=other could just be the default value for all other characters which didn't fit into one of the specifically Indic script types, including vowels and consonants in non-Indic scripts. The Indic_Syllabic_Category alone wouldn't be enough to define aksaras, as you would need to include various diacritic marks, for example -- but those would depend on other, already-defined properties. I see Indic_Syllabic_Category as only supplying the additional information not derivable from other properties. Then for matras specifically, a placement property, along the lines of: Matra_Placement = [right, left, left_and_right, bottom, top, bottom_and_top, right_mixed, left_mixed, other] Of those, bottom, top, and bottom_and_top would be gc=Mn, while the other values would correspond to gc=Mc. left_mixed would be for a case like Oriya U+0B48. right_mixed for Kannada U+0CCA. other could include the Kharoshti crossing marks. left, left_and_right, and left_mixed all require a glyphic reordering from logical order during layout. For consonants, I think there may be a need for two distinct subtyping schemes. One would capture the model implications of the different types of consonants we have encoded for various Brahmi-derived scripts: Consonant_Structure_Type = [ordinary, dead, subjoined, medial, final] The second would identify the liquids and nasals subject to special behavior in aksaras in many Brahmi-derived scripts: Consonant_Join_Group = [nga, ya, ra, la, wa, ha, other] The point of this second property would not be to claim that these are all important for every script -- but identifying each of these clearly for each Brahmi-derived script that contains an analog would pretty much cover the ground in terms of truly exceptional behavior. At least it is worth taking a first look at, to see if we can actually define this for all the Brahmi-derived scripts. And then, for the dandas, a binary property: Danda = True/False I think this scheme generalizes to all the Brahmi-derived scripts pretty well. But for some of the extensions, it would probably make sense to also extend the Indic_Syllabic_ Category somewhat. For SE Asian scripts, we perhaps would have to add: [register_mark, tone_mark, killer]. For Tibetan: [head_letter, head_mark] and maybe others. For Tai Le, add: [tone_letter]. For Khmer, Balinese, and Sundanese, add: [repha] ====================================================== Bindi/Anusvara (nasalization or -n) [Not derivable] 0901;DEVANAGARI SIGN CANDRABINDU;Mn;0;NSM;;;;;N;;;;; 0902;DEVANAGARI SIGN ANUSVARA;Mn;0;NSM;;;;;N;;;;; ====================================================== Visarga (-h) [Not derivable] 0903;DEVANAGARI SIGN VISARGA;Mc;0;L;;;;;N;;;;; ====================================================== Nukta (diacritic for borrowed consonants) [Not derivable] 093C;DEVANAGARI SIGN NUKTA;Mn;7;NSM;;;;;N;;;;; ====================================================== Avagraha (elision of initial a- in sandhi) [Not derivable] 093D;DEVANAGARI SIGN AVAGRAHA;Lo;0;L;;;;;N;;;;; ====================================================== Virama (killing of inherent vowel in consonant sequence, or consonant stacker, depending on model) [Derivation: ccc=9] 094D;DEVANAGARI SIGN VIRAMA;Mn;9;NSM;;;;;N;;;;; ====================================================== Independent Vowels [Not derivable] 0904;DEVANAGARI LETTER SHORT A;Lo;0;L;;;;;N;;;;; 0905;DEVANAGARI LETTER A;Lo;0;L;;;;;N;;;;; 0906;DEVANAGARI LETTER AA;Lo;0;L;;;;;N;;;;; 0907;DEVANAGARI LETTER I;Lo;0;L;;;;;N;;;;; 0908;DEVANAGARI LETTER II;Lo;0;L;;;;;N;;;;; 0909;DEVANAGARI LETTER U;Lo;0;L;;;;;N;;;;; 090A;DEVANAGARI LETTER UU;Lo;0;L;;;;;N;;;;; 090B;DEVANAGARI LETTER VOCALIC R;Lo;0;L;;;;;N;;;;; 090C;DEVANAGARI LETTER VOCALIC L;Lo;0;L;;;;;N;;;;; 090D;DEVANAGARI LETTER CANDRA E;Lo;0;L;;;;;N;;;;; 090E;DEVANAGARI LETTER SHORT E;Lo;0;L;;;;;N;;;;; 090F;DEVANAGARI LETTER E;Lo;0;L;;;;;N;;;;; 0910;DEVANAGARI LETTER AI;Lo;0;L;;;;;N;;;;; 0911;DEVANAGARI LETTER CANDRA O;Lo;0;L;;;;;N;;;;; 0912;DEVANAGARI LETTER SHORT O;Lo;0;L;;;;;N;;;;; 0913;DEVANAGARI LETTER O;Lo;0;L;;;;;N;;;;; 0914;DEVANAGARI LETTER AU;Lo;0;L;;;;;N;;;;; 0960;DEVANAGARI LETTER VOCALIC RR;Lo;0;L;;;;;N;;;;; 0961;DEVANAGARI LETTER VOCALIC LL;Lo;0;L;;;;;N;;;;; 0972;DEVANAGARI LETTER CANDRA A;Lo;0;L;;;;;N;;;;; ====================================================== Consonants [Not derivable] Ordinary 0915;DEVANAGARI LETTER KA;Lo;0;L;;;;;N;;;;; 0916;DEVANAGARI LETTER KHA;Lo;0;L;;;;;N;;;;; 0917;DEVANAGARI LETTER GA;Lo;0;L;;;;;N;;;;; 0918;DEVANAGARI LETTER GHA;Lo;0;L;;;;;N;;;;; 0919;DEVANAGARI LETTER NGA;Lo;0;L;;;;;N;;;;; 091A;DEVANAGARI LETTER CA;Lo;0;L;;;;;N;;;;; 091B;DEVANAGARI LETTER CHA;Lo;0;L;;;;;N;;;;; 091C;DEVANAGARI LETTER JA;Lo;0;L;;;;;N;;;;; 091D;DEVANAGARI LETTER JHA;Lo;0;L;;;;;N;;;;; 091E;DEVANAGARI LETTER NYA;Lo;0;L;;;;;N;;;;; 091F;DEVANAGARI LETTER TTA;Lo;0;L;;;;;N;;;;; 0920;DEVANAGARI LETTER TTHA;Lo;0;L;;;;;N;;;;; 0921;DEVANAGARI LETTER DDA;Lo;0;L;;;;;N;;;;; 0922;DEVANAGARI LETTER DDHA;Lo;0;L;;;;;N;;;;; 0923;DEVANAGARI LETTER NNA;Lo;0;L;;;;;N;;;;; 0924;DEVANAGARI LETTER TA;Lo;0;L;;;;;N;;;;; 0925;DEVANAGARI LETTER THA;Lo;0;L;;;;;N;;;;; 0926;DEVANAGARI LETTER DA;Lo;0;L;;;;;N;;;;; 0927;DEVANAGARI LETTER DHA;Lo;0;L;;;;;N;;;;; 0928;DEVANAGARI LETTER NA;Lo;0;L;;;;;N;;;;; 0929;DEVANAGARI LETTER NNNA;Lo;0;L;0928 093C;;;;N;;;;; 092A;DEVANAGARI LETTER PA;Lo;0;L;;;;;N;;;;; 092B;DEVANAGARI LETTER PHA;Lo;0;L;;;;;N;;;;; 092C;DEVANAGARI LETTER BA;Lo;0;L;;;;;N;;;;; 092D;DEVANAGARI LETTER BHA;Lo;0;L;;;;;N;;;;; 092E;DEVANAGARI LETTER MA;Lo;0;L;;;;;N;;;;; 092F;DEVANAGARI LETTER YA;Lo;0;L;;;;;N;;;;; 0930;DEVANAGARI LETTER RA;Lo;0;L;;;;;N;;;;; 0931;DEVANAGARI LETTER RRA;Lo;0;L;0930 093C;;;;N;;;;; 0932;DEVANAGARI LETTER LA;Lo;0;L;;;;;N;;;;; 0933;DEVANAGARI LETTER LLA;Lo;0;L;;;;;N;;;;; 0934;DEVANAGARI LETTER LLLA;Lo;0;L;0933 093C;;;;N;;;;; 0935;DEVANAGARI LETTER VA;Lo;0;L;;;;;N;;;;; 0936;DEVANAGARI LETTER SHA;Lo;0;L;;;;;N;;;;; 0937;DEVANAGARI LETTER SSA;Lo;0;L;;;;;N;;;;; 0938;DEVANAGARI LETTER SA;Lo;0;L;;;;;N;;;;; 0939;DEVANAGARI LETTER HA;Lo;0;L;;;;;N;;;;; 0958;DEVANAGARI LETTER QA;Lo;0;L;0915 093C;;;;N;;;;; 0959;DEVANAGARI LETTER KHHA;Lo;0;L;0916 093C;;;;N;;;;; 095A;DEVANAGARI LETTER GHHA;Lo;0;L;0917 093C;;;;N;;;;; 095B;DEVANAGARI LETTER ZA;Lo;0;L;091C 093C;;;;N;;;;; 095C;DEVANAGARI LETTER DDDHA;Lo;0;L;0921 093C;;;;N;;;;; 095D;DEVANAGARI LETTER RHA;Lo;0;L;0922 093C;;;;N;;;;; 095E;DEVANAGARI LETTER FA;Lo;0;L;092B 093C;;;;N;;;;; 095F;DEVANAGARI LETTER YYA;Lo;0;L;092F 093C;;;;N;;;;; 097B;DEVANAGARI LETTER GGA;Lo;0;L;;;;;N;;;;; 097C;DEVANAGARI LETTER JJA;Lo;0;L;;;;;N;;;;; 097D;DEVANAGARI LETTER GLOTTAL STOP;Lo;0;L;;;;;N;;;;; 097E;DEVANAGARI LETTER DDDA;Lo;0;L;;;;;N;;;;; 097F;DEVANAGARI LETTER BBA;Lo;0;L;;;;;N;;;;; Dead [Null set for Devanagari, but would include Malayalam chillus.] Subjoined [Null set for Devanagari, but would include Tibetan subjoined C's.] Medial [Null set for Devanagari, but would include Myanmar medial C's.] Final [Null set for Devanagari, but would include Limbu final C's.] ===================================================== Consonant Join Group (identifies characters which generally may have exceptional rendering in Brahmi-derived scripts) [Not derivable] NGA 0919;DEVANAGARI LETTER NGA;Lo;0;L;;;;;N;;;;; YA 092F;DEVANAGARI LETTER YA;Lo;0;L;;;;;N;;;;; RA 0930;DEVANAGARI LETTER RA;Lo;0;L;;;;;N;;;;; LA 0932;DEVANAGARI LETTER LA;Lo;0;L;;;;;N;;;;; WA 0935;DEVANAGARI LETTER VA;Lo;0;L;;;;;N;;;;; HA 0939;DEVANAGARI LETTER HA;Lo;0;L;;;;;N;;;;; ====================================================== Dependent Vowels (matras) [Not derivable] Right-side 093E;DEVANAGARI VOWEL SIGN AA;Mc;0;L;;;;;N;;;;; 0940;DEVANAGARI VOWEL SIGN II;Mc;0;L;;;;;N;;;;; 0949;DEVANAGARI VOWEL SIGN CANDRA O;Mc;0;L;;;;;N;;;;; 094A;DEVANAGARI VOWEL SIGN SHORT O;Mc;0;L;;;;;N;;;;; 094B;DEVANAGARI VOWEL SIGN O;Mc;0;L;;;;;N;;;;; 094C;DEVANAGARI VOWEL SIGN AU;Mc;0;L;;;;;N;;;;; Left-side 093F;DEVANAGARI VOWEL SIGN I;Mc;0;L;;;;;N;;;;; Bottom 0941;DEVANAGARI VOWEL SIGN U;Mn;0;NSM;;;;;N;;;;; 0942;DEVANAGARI VOWEL SIGN UU;Mn;0;NSM;;;;;N;;;;; 0943;DEVANAGARI VOWEL SIGN VOCALIC R;Mn;0;NSM;;;;;N;;;;; 0944;DEVANAGARI VOWEL SIGN VOCALIC RR;Mn;0;NSM;;;;;N;;;;; 0962;DEVANAGARI VOWEL SIGN VOCALIC L;Mn;0;NSM;;;;;N;;;;; 0963;DEVANAGARI VOWEL SIGN VOCALIC LL;Mn;0;NSM;;;;;N;;;;; Top 0945;DEVANAGARI VOWEL SIGN CANDRA E;Mn;0;NSM;;;;;N;;;;; 0946;DEVANAGARI VOWEL SIGN SHORT E;Mn;0;NSM;;;;;N;;;;; 0947;DEVANAGARI VOWEL SIGN E;Mn;0;NSM;;;;;N;;;;; 0948;DEVANAGARI VOWEL SIGN AI;Mn;0;NSM;;;;;N;;;;; ====================================================== Generic Accents [Derivation: gc=Mn, after bleeding of other categories] 0951;DEVANAGARI STRESS SIGN UDATTA;Mn;230;NSM;;;;;N;;;;; 0952;DEVANAGARI STRESS SIGN ANUDATTA;Mn;220;NSM;;;;;N;;;;; 0953;DEVANAGARI GRAVE ACCENT;Mn;230;NSM;;;;;N;;;;; 0954;DEVANAGARI ACUTE ACCENT;Mn;230;NSM;;;;;N;;;;; ====================================================== Punctuation [Derivation: gc=Po] Dandas [Not derivable] 0964;DEVANAGARI DANDA;Po;0;L;;;;;N;;;;; 0965;DEVANAGARI DOUBLE DANDA;Po;0;L;;;;;N;;;;; Other [Derivation: gc=Po, after bleeding of dandas] 0970;DEVANAGARI ABBREVIATION SIGN;Po;0;L;;;;;N;;;;; ====================================================== Digits [Derivation: gc=Nd] 0966;DEVANAGARI DIGIT ZERO;Nd;0;L;;0;0;0;N;;;;; 0967;DEVANAGARI DIGIT ONE;Nd;0;L;;1;1;1;N;;;;; 0968;DEVANAGARI DIGIT TWO;Nd;0;L;;2;2;2;N;;;;; 0969;DEVANAGARI DIGIT THREE;Nd;0;L;;3;3;3;N;;;;; 096A;DEVANAGARI DIGIT FOUR;Nd;0;L;;4;4;4;N;;;;; 096B;DEVANAGARI DIGIT FIVE;Nd;0;L;;5;5;5;N;;;;; 096C;DEVANAGARI DIGIT SIX;Nd;0;L;;6;6;6;N;;;;; 096D;DEVANAGARI DIGIT SEVEN;Nd;0;L;;7;7;7;N;;;;; 096E;DEVANAGARI DIGIT EIGHT;Nd;0;L;;8;8;8;N;;;;; 096F;DEVANAGARI DIGIT NINE;Nd;0;L;;9;9;9;N;;;;; ====================================================== Other Letterlike Signs [Not derivable] 0950;DEVANAGARI OM;Lo;0;L;;;;;N;;;;; Modifier Letters [Derivation: gc=Lm] 0971;DEVANAGARI SIGN HIGH SPACING DOT;Lm;0;L;;;;;N;;;;;