UTC/1999-024 Normative Changes to UnicodeData.txt after the beta period for Unicode 3.0 The following is the list of normative changes (and most of the informative changes) that occurred to the Unicode Character Database data files during the beta period. The normative changes need to be ratified by UTC decision. I. Normative Changes to UnicodeData.txt A. Bidi Category Changes 1. Consistency fixes for FORM FEED, NEXT LINE < 000C;;Cc;0;B;;;;;N;FORM FEED;;;; > 000C;;Cc;0;WS;;;;;N;FORM FEED;;;; < 0085;;Cc;0;BN;;;;;N;NEXT LINE;;;; > 0085;;Cc;0;B;;;;;N;NEXT LINE;;;; Rationale: FORM FEED breaks a page, but does not behave like a Paragraph Separator. More consistent to give it bidi property WS. NEXT LINE, on the other hand, should be treated like a Paragraph Separator, rather than as one of the rest of the boundary neutrals. Consensus was reached on this by the bidi committee. 2. Consistency fix for NARROW NO-BREAK SPACE < 202F;NARROW NO-BREAK SPACE;Zs;0;CS; 0020;;;;N;;;;; > 202F;NARROW NO-BREAK SPACE;Zs;0;WS; 0020;;;;N;;;; Rationale: NARROW NO-BREAK SPACE should have the same bidi property as NO-BREAK SPACE. This was overlooked when the bidi property for NO-BREAK SPACE was updated earlier. 3. Consistency fix for an isolated APL symbol < 2395;APL FUNCTIONAL SYMBOL QUAD;So;0;ON;;;;;N;;;;; > 2395;APL FUNCTIONAL SYMBOL QUAD;So;0;L;;;;;N;;;;; Rationale: All other APL symbols are "L". This was an oversight during the initial assignment of bidi properties to newly added symbols. A'. Bidi Category Changes (plus informative change to General Category) Yi radical consistency fix -- changed to match treatment for CJK radicals. Rationale: All the new CJK radicals are General Category So, Bidi Category ON. There seemed no good reason to treat the Yi radicals any differently, since they are conceptually the same kind of entity as the CJK radicals. < A490;YI RADICAL QOT;Lo;0;L;;;;;N;;;;; < A491;YI RADICAL LI;Lo;0;L;;;;;N;;;;; < A492;YI RADICAL KIT;Lo;0;L;;;;;N;;;;; < A493;YI RADICAL NYIP;Lo;0;L;;;;;N;;;;; < A494;YI RADICAL CYP;Lo;0;L;;;;;N;;;;; < A495;YI RADICAL SSI;Lo;0;L;;;;;N;;;;; < A496;YI RADICAL GGOP;Lo;0;L;;;;;N;;;;; < A497;YI RADICAL GEP;Lo;0;L;;;;;N;;;;; < A498;YI RADICAL MI;Lo;0;L;;;;;N;;;;; < A499;YI RADICAL HXIT;Lo;0;L;;;;;N;;;;; < A49A;YI RADICAL LYR;Lo;0;L;;;;;N;;;;; < A49B;YI RADICAL BBUT;Lo;0;L;;;;;N;;;;; < A49C;YI RADICAL MOP;Lo;0;L;;;;;N;;;;; < A49D;YI RADICAL YO;Lo;0;L;;;;;N;;;;; < A49E;YI RADICAL PUT;Lo;0;L;;;;;N;;;;; < A49F;YI RADICAL HXUO;Lo;0;L;;;;;N;;;;; < A4A0;YI RADICAL TAT;Lo;0;L;;;;;N;;;;; < A4A1;YI RADICAL GA;Lo;0;L;;;;;N;;;;; < A4A4;YI RADICAL DDUR;Lo;0;L;;;;;N;;;;; < A4A5;YI RADICAL BUR;Lo;0;L;;;;;N;;;;; < A4A6;YI RADICAL GGUO;Lo;0;L;;;;;N;;;;; < A4A7;YI RADICAL NYOP;Lo;0;L;;;;;N;;;;; < A4A8;YI RADICAL TU;Lo;0;L;;;;;N;;;;; < A4A9;YI RADICAL OP;Lo;0;L;;;;;N;;;;; < A4AA;YI RADICAL JJUT;Lo;0;L;;;;;N;;;;; < A4AB;YI RADICAL ZOT;Lo;0;L;;;;;N;;;;; < A4AC;YI RADICAL PYT;Lo;0;L;;;;;N;;;;; < A4AD;YI RADICAL HMO;Lo;0;L;;;;;N;;;;; < A4AE;YI RADICAL YIT;Lo;0;L;;;;;N;;;;; < A4AF;YI RADICAL VUR;Lo;0;L;;;;;N;;;;; < A4B0;YI RADICAL SHY;Lo;0;L;;;;;N;;;;; < A4B1;YI RADICAL VEP;Lo;0;L;;;;;N;;;;; < A4B2;YI RADICAL ZA;Lo;0;L;;;;;N;;;;; < A4B3;YI RADICAL JO;Lo;0;L;;;;;N;;;;; < A4B5;YI RADICAL JJY;Lo;0;L;;;;;N;;;;; < A4B6;YI RADICAL GOT;Lo;0;L;;;;;N;;;;; < A4B7;YI RADICAL JJIE;Lo;0;L;;;;;N;;;;; < A4B8;YI RADICAL WO;Lo;0;L;;;;;N;;;;; < A4B9;YI RADICAL DU;Lo;0;L;;;;;N;;;;; < A4BA;YI RADICAL SHUR;Lo;0;L;;;;;N;;;;; < A4BB;YI RADICAL LIE;Lo;0;L;;;;;N;;;;; < A4BC;YI RADICAL CY;Lo;0;L;;;;;N;;;;; < A4BD;YI RADICAL CUOP;Lo;0;L;;;;;N;;;;; < A4BE;YI RADICAL CIP;Lo;0;L;;;;;N;;;;; < A4BF;YI RADICAL HXOP;Lo;0;L;;;;;N;;;;; < A4C0;YI RADICAL SHAT;Lo;0;L;;;;;N;;;;; < A4C2;YI RADICAL SHOP;Lo;0;L;;;;;N;;;;; < A4C3;YI RADICAL CHE;Lo;0;L;;;;;N;;;;; < A4C4;YI RADICAL ZZIET;Lo;0;L;;;;;N;;;;; < A4C6;YI RADICAL KE;Lo;0;L;;;;;N;;;;; --- > A490;YI RADICAL QOT;So;0;ON;;;;;N;;;;; > A491;YI RADICAL LI;So;0;ON;;;;;N;;;;; > A492;YI RADICAL KIT;So;0;ON;;;;;N;;;;; > A493;YI RADICAL NYIP;So;0;ON;;;;;N;;;;; > A494;YI RADICAL CYP;So;0;ON;;;;;N;;;;; > A495;YI RADICAL SSI;So;0;ON;;;;;N;;;;; > A496;YI RADICAL GGOP;So;0;ON;;;;;N;;;;; > A497;YI RADICAL GEP;So;0;ON;;;;;N;;;;; > A498;YI RADICAL MI;So;0;ON;;;;;N;;;;; > A499;YI RADICAL HXIT;So;0;ON;;;;;N;;;;; > A49A;YI RADICAL LYR;So;0;ON;;;;;N;;;;; > A49B;YI RADICAL BBUT;So;0;ON;;;;;N;;;;; > A49C;YI RADICAL MOP;So;0;ON;;;;;N;;;;; > A49D;YI RADICAL YO;So;0;ON;;;;;N;;;;; > A49E;YI RADICAL PUT;So;0;ON;;;;;N;;;;; > A49F;YI RADICAL HXUO;So;0;ON;;;;;N;;;;; > A4A0;YI RADICAL TAT;So;0;ON;;;;;N;;;;; > A4A1;YI RADICAL GA;So;0;ON;;;;;N;;;;; > A4A4;YI RADICAL DDUR;So;0;ON;;;;;N;;;;; > A4A5;YI RADICAL BUR;So;0;ON;;;;;N;;;;; > A4A6;YI RADICAL GGUO;So;0;ON;;;;;N;;;;; > A4A7;YI RADICAL NYOP;So;0;ON;;;;;N;;;;; > A4A8;YI RADICAL TU;So;0;ON;;;;;N;;;;; > A4A9;YI RADICAL OP;So;0;ON;;;;;N;;;;; > A4AA;YI RADICAL JJUT;So;0;ON;;;;;N;;;;; > A4AB;YI RADICAL ZOT;So;0;ON;;;;;N;;;;; > A4AC;YI RADICAL PYT;So;0;ON;;;;;N;;;;; > A4AD;YI RADICAL HMO;So;0;ON;;;;;N;;;;; > A4AE;YI RADICAL YIT;So;0;ON;;;;;N;;;;; > A4AF;YI RADICAL VUR;So;0;ON;;;;;N;;;;; > A4B0;YI RADICAL SHY;So;0;ON;;;;;N;;;;; > A4B1;YI RADICAL VEP;So;0;ON;;;;;N;;;;; > A4B2;YI RADICAL ZA;So;0;ON;;;;;N;;;;; > A4B3;YI RADICAL JO;So;0;ON;;;;;N;;;;; > A4B5;YI RADICAL JJY;So;0;ON;;;;;N;;;;; > A4B6;YI RADICAL GOT;So;0;ON;;;;;N;;;;; > A4B7;YI RADICAL JJIE;So;0;ON;;;;;N;;;;; > A4B8;YI RADICAL WO;So;0;ON;;;;;N;;;;; > A4B9;YI RADICAL DU;So;0;ON;;;;;N;;;;; > A4BA;YI RADICAL SHUR;So;0;ON;;;;;N;;;;; > A4BB;YI RADICAL LIE;So;0;ON;;;;;N;;;;; > A4BC;YI RADICAL CY;So;0;ON;;;;;N;;;;; > A4BD;YI RADICAL CUOP;So;0;ON;;;;;N;;;;; > A4BE;YI RADICAL CIP;So;0;ON;;;;;N;;;;; > A4BF;YI RADICAL HXOP;So;0;ON;;;;;N;;;;; > A4C0;YI RADICAL SHAT;So;0;ON;;;;;N;;;;; > A4C2;YI RADICAL SHOP;So;0;ON;;;;;N;;;;; > A4C3;YI RADICAL CHE;So;0;ON;;;;;N;;;;; > A4C4;YI RADICAL ZZIET;So;0;ON;;;;;N;;;;; > A4C6;YI RADICAL KE;So;0;ON;;;;;N;;;;; B. General Category Changes 1. Non-case-mapped characters changed Lo > Ll < 01AA;LATIN LETTER REVERSED ESH LOOP;Lo;0;L;;;;;N;;;;; > 01AA;LATIN LETTER REVERSED ESH LOOP;Ll;0;L;;;;;N;;;;; < 01BE;LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE;Lo;0;L;;;;;N;LATIN LETTER INVERTED GLOTTAL STOP BAR;;;; > 01BE;LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE;Ll;0;L;;;;;N;LATIN LETTER INVERTED GLOTTAL STOP BAR;;;; < 03F3;GREEK LETTER YOT;Lo;0;L;;;;;N;;;;; > 03F3;GREEK LETTER YOT;Ll;0;L;;;;;N;;;;; Rationale: These three letters are lowercase forms, even though they have no case-mappings. They should be treated like other lowercase forms with no case mappings (see various IPA characters for precedents). 2. Non-case-maped character changed Lo > Lu < 04C0;CYRILLIC LETTER PALOCHKA;Lo;0;L;;;;;N;CYRILLIC LETTER I;;;; > 04C0;CYRILLIC LETTER PALOCHKA;Lu;0;L;;;;;N;CYRILLIC LETTER I;;;; Rationale: The PALOCHKA has no case-mapping, but its form is of an uppercase I. Thus, Lu is a more appropriate assignment of General Category than Lo. C. Decomposition Changes 1. Remove compatibility decomposition for 2 ASCII spacing accents and low line. < 005E;CIRCUMFLEX ACCENT;Sk;0;ON; 0020 0302;;;;N;SPACING CIRCUMFLEX;;;; < 005F;LOW LINE;Pc;0;ON; 0020 0332;;;;N;SPACING UNDERSCORE;;;; < 0060;GRAVE ACCENT;Sk;0;ON; 0020 0300;;;;N;SPACING GRAVE;;;; --- > 005E;CIRCUMFLEX ACCENT;Sk;0;ON;;;;;N;SPACING CIRCUMFLEX;;;; > 005F;LOW LINE;Pc;0;ON;;;;;N;SPACING UNDERSCORE;;;; > 0060;GRAVE ACCENT;Sk;0;ON;;;;;N;SPACING GRAVE;;;; Rationale: These characters in ASCII cause problems if they are not invariant across the normalization forms. Removal of the compatibility decompositions for these 3 was the least problematical solution for fixing normalization Form KC. 2. Add or remove canonical decompositions for nuktated Indic forms. < 09B0;BENGALI LETTER RA;Lo;0;L;09AC 09BC;;;;N;;;;; > 09B0;BENGALI LETTER RA;Lo;0;L;;;;;N;;;;; < 0A33;GURMUKHI LETTER LLA;Lo;0;L;;;;;N;;;;; > 0A33;GURMUKHI LETTER LLA;Lo;0;L;0A32 0A3C;;;;N;;;;; < 0A36;GURMUKHI LETTER SHA;Lo;0;L;;;;;N;;;;; > 0A36;GURMUKHI LETTER SHA;Lo;0;L;0A38 0A3C;;;;N;;;;; < 0A5C;GURMUKHI LETTER RRA;Lo;0;L;0A21 0A3C;;;;N;;;;; > 0A5C;GURMUKHI LETTER RRA;Lo;0;L;;;;;N;;;;; < 0B5F;ORIYA LETTER YYA;Lo;0;L;0B2F 0B3C;;;;N;;;;; > 0B5F;ORIYA LETTER YYA;Lo;0;L;;;;;N;;;;; Rationale: Requests from Indic experts, to more closely match native speaker intuitions about these characters, and (in some cases) to follow visible form rather than structural positional analogy to Devanagari. 3. Change SARA AM decomposition from canonical to compatibility. < 0E33;THAI CHARACTER SARA AM;Lo;0;L;0E4D 0E32;;;;N;THAI VOWEL SIGN SARA AM;;;; > 0E33;THAI CHARACTER SARA AM;Lo;0;L; 0E4D 0E32;;;;N;THAI VOWEL SIGN SARA AM;;;; < 0EB3;LAO VOWEL SIGN AM;Lo;0;L;0ECD 0EB2;;;;N;;;;; > 0EB3;LAO VOWEL SIGN AM;Lo;0;L; 0ECD 0EB2;;;;N;;;;; Rationale: These changes were to fix a problem in normalization. D. Canonical Ordering Class Changes < 0E4D;THAI CHARACTER NIKHAHIT;Mn;107;NSM;;;;;N;THAI NIKKHAHIT;nikkhahit;;; > 0E4D;THAI CHARACTER NIKHAHIT;Mn;0;NSM;;;;;N;THAI NIKKHAHIT;nikkhahit;;; < 0ECD;LAO NIGGAHITA;Mn;122;NSM;;;;;N;;;;; > 0ECD;LAO NIGGAHITA;Mn;0;NSM;;;;;N;;;;; Rationale: Related to the fixes for SARA AM for normalization. These two characters should never participate in a rearrangement, and so were assigned class 0. Having a decomposition starting with non-zero class also causes difficulties. E. Numerical Value Change < 09F8;BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR;No;0;L;;;;-1;N;;;;; > 09F8;BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR;No;0;L;;;;;N;;;;; Rationale: The strange numeric value of -1 for this character causes problems for some implementations. It requires special format knowledge and is not helpful by itself in the data file. II. Informative Changes to UnicodeData.txt A. General Category Changes 1. So > Sm < 219A;LEFTWARDS ARROW WITH STROKE;So;0;ON;2190 0338;;;;N;LEFT ARROW WITH STROKE;;;; < 219B;RIGHTWARDS ARROW WITH STROKE;So;0;ON;2192 0338;;;;N;RIGHT ARROW WITH STROKE;;;; --- > 219A;LEFTWARDS ARROW WITH STROKE;Sm;0;ON;2190 0338;;;;N;LEFT ARROW WITH STROKE;;;; > 219B;RIGHTWARDS ARROW WITH STROKE;Sm;0;ON;2192 0338;;;;N;RIGHT ARROW WITH STROKE;;;; < 21A0;RIGHTWARDS TWO HEADED ARROW;So;0;ON;;;;;N;RIGHT TWO HEADED ARROW;;;; > 21A0;RIGHTWARDS TWO HEADED ARROW;Sm;0;ON;;;;;N;RIGHT TWO HEADED ARROW;;;; < 21A3;RIGHTWARDS ARROW WITH TAIL;So;0;ON;;;;;N;RIGHT ARROW WITH TAIL;;;; > 21A3;RIGHTWARDS ARROW WITH TAIL;Sm;0;ON;;;;;N;RIGHT ARROW WITH TAIL;;;; < 21A6;RIGHTWARDS ARROW FROM BAR;So;0;ON;;;;;N;RIGHT ARROW FROM BAR;;;; > 21A6;RIGHTWARDS ARROW FROM BAR;Sm;0;ON;;;;;N;RIGHT ARROW FROM BAR;;;; < 21AE;LEFT RIGHT ARROW WITH STROKE;So;0;ON;2194 0338;;;;N;;;;; > 21AE;LEFT RIGHT ARROW WITH STROKE;Sm;0;ON;2194 0338;;;;N;;;;; < 21CE;LEFT RIGHT DOUBLE ARROW WITH STROKE;So;0;ON;21D4 0338;;;;N;;;;; < 21CF;RIGHTWARDS DOUBLE ARROW WITH STROKE;So;0;ON;21D2 0338;;;;N;RIGHT DOUBLE ARROW WITH STROKE;;;; --- > 21CE;LEFT RIGHT DOUBLE ARROW WITH STROKE;Sm;0;ON;21D4 0338;;;;N;;;;; > 21CF;RIGHTWARDS DOUBLE ARROW WITH STROKE;Sm;0;ON;21D2 0338;;;;N;RIGHT DOUBLE ARROW WITH STROKE;;;; < 25B7;WHITE RIGHT-POINTING TRIANGLE;So;0;ON;;;;;N;WHITE RIGHT POINTING TRIANGLE;;;; > 25B7;WHITE RIGHT-POINTING TRIANGLE;Sm;0;ON;;;;;N;WHITE RIGHT POINTING TRIANGLE;;;; < 25C1;WHITE LEFT-POINTING TRIANGLE;So;0;ON;;;;;N;WHITE LEFT POINTING TRIANGLE;;;; > 25C1;WHITE LEFT-POINTING TRIANGLE;Sm;0;ON;;;;;N;WHITE LEFT POINTING TRIANGLE;;;; < 266F;MUSIC SHARP SIGN;So;0;ON;;;;;N;SHARP;;;; > 266F;MUSIC SHARP SIGN;Sm;0;ON;;;;;N;SHARP;;;; Rationale: These changes resulted from the input from AMS regarding mathematical functions for these symbols (in particular, their use in z-notation). Accordingly, their General Category was updated to Sm. 2. So > Po < 0E4F;THAI CHARACTER FONGMAN;So;0;L;;;;;N;THAI FONGMAN;;;; > 0E4F;THAI CHARACTER FONGMAN;Po;0;L;;;;;N;THAI FONGMAN;;;; < 17DC;KHMER SIGN AVAKRAHASANYA;So;0;L;;;;;N;;;;; > 17DC;KHMER SIGN AVAKRAHASANYA;Po;0;L;;;;;N;;;;; Rationale: Input on the names list annotations turned up the fact that these two were better treated as punctuation marks, rather than just symbols. 3. So > Ps/Pe < 169B;OGHAM FEATHER MARK;So;0;L;;;;;N;;;;; < 169C;OGHAM REVERSED FEATHER MARK;So;0;L;;;;;N;;;;; --- > 169B;OGHAM FEATHER MARK;Ps;0;ON;;;;;N;;;;; > 169C;OGHAM REVERSED FEATHER MARK;Pe;0;ON;;;;;N;;;;; Rationale: Input on the names list annotations clarified that these are bracketing punctuation, rather than just symbols. 4. Po > Pd < 1806;MONGOLIAN TODO SOFT HYPHEN;Po;0;ON;;;;;N;;;;; > 1806;MONGOLIAN TODO SOFT HYPHEN;Pd;0;ON;;;;;N;;;;; Rationale: Rectifies an oversight in identifying this as a dash. III. Editorial Changes to Blocks.txt Correct order of line in file: 4E00; 9FFF; CJK Unified Ideographs IV. Editorial Changes to ArabicShaping.txt < 0629; TEH MARBUTAH; R; TEH MARBUTAH > 0629; TEH MARBUTA; R; TEH MARBUTA < 0649; ALEF MAQSURA; R; ALEF MAQSURA > 0649; ALEF MAKSURA; R; YEH < 06CD; YEH WITH TAIL; R; ALEF MAQSURA > 06CD; YEH WITH TAIL; R; YEH WITH TAIL Rationale: Spelling fixes, and editorial oversight in updating the shaping classes for 0649 and 06CD. V. Normative Changes to ArabicShaping.txt Added Syriac shaping classes. VI. Normative Changes to CompositionExclusions.txt Added: > 0A33 # GURMUKHI LETTER LLA > 0A36 # GURMUKHI LETTER SHA Rationale: For consistency with change in Indic decompositions. > 0F43 # TIBETAN LETTER GHA > 0F4D # TIBETAN LETTER DDHA > 0F52 # TIBETAN LETTER DHA > 0F57 # TIBETAN LETTER BHA > 0F5C # TIBETAN LETTER DZHA > 0F69 # TIBETAN LETTER KSSA > 0F76 # TIBETAN VOWEL SIGN VOCALIC R > 0F78 # TIBETAN VOWEL SIGN VOCALIC L > 0F93 # TIBETAN SUBJOINED LETTER GHA > 0F9D # TIBETAN SUBJOINED LETTER DDHA > 0FA2 # TIBETAN SUBJOINED LETTER DHA > 0FA7 # TIBETAN SUBJOINED LETTER BHA > 0FAC # TIBETAN SUBJOINED LETTER DZHA > 0FB9 # TIBETAN SUBJOINED LETTER KSSA Rationale: Like Hebrew and Indic scripts, the preferred form for Tibetan, even for Normalization Form C, is decomposed. Hence, these exclusions were added to the script-specific section of the data file. > FB44 # HEBREW LETTER PE WITH DAGESH Rationale: Correction of an oversight -- missing item in the list. Removed: < 0A5C # GURMUKHI LETTER RRA < 0B5F # ORIYA LETTER YYA Rationale: For consistency with change in Indic decompositions. 5