L2/06-386 Source: Mark Davis Date: November 9, 2006 Subject: Properties not preserving canonical equivalence In response to the issues raised by Kent, I wrote a test during the break in the meeting and ran it over a number of the properties, to determine when the values change if the character is NFC'ed. As a simplification, I just look at whether the properties for the first character of the decomposition change. This will catch all the common cases: singletons and base+accents. Here are the results. In particular it looks like we also need to do the Word_Break property for FA30..FA6A, and we might want to align some others like East_Asian_Width. Properties tested: ASCII_Hex_Digit, Alphabetic, Bidi_Class, Bidi_Control, Bidi_Mirrored, Canonical_Combining_Class, Case_Fold_Turkish_I, Dash, Default_Ignorable_Code_Point, Diacritic, East_Asian_Width, Extender, General_Category, Grapheme_Base, Grapheme_Cluster_Break, Grapheme_Extend, Grapheme_Link, Hangul_Syllable_Type, Hex_Digit, Hyphen, IDS_Binary_Operator, IDS_Trinary_Operator, ID_Continue, ID_Start, Ideographic, Join_Control, Joining_Group, Joining_Type, Line_Break, Logical_Order_Exception, Lowercase, Math, Non_Break, Noncharacter_Code_Point, Numeric_Type, Numeric_Value, Other_Alphabetic, Other_Default_Ignorable_Code_Point, Other_Grapheme_Extend, Other_ID_Continue, Other_ID_Start, Other_Lowercase, Other_Math, Other_Uppercase, Pattern_Syntax, Pattern_White_Space, Quotation_Mark, Radical, STerm, Script, Sentence_Break, Soft_Dotted, Terminal_Punctuation, Uppercase, Variation_Selector, White_Space, Word_Break, XID_Continue, XID_Start Cases where differences are found: [Alphabetic, General_Category, ID_Continue, ID_Start, Script, Sentence_Break, Word_Break, XID_Continue, XID_Start] 0374 # Sk GREEK NUMERAL SIGN # Total code points: 1 [East_Asian_Width, Pattern_Syntax] 037E # Po GREEK QUESTION MARK # Total code points: 1 [Diacritic, East_Asian_Width, Extender, Line_Break, Terminal_Punctuation, Word_Break, XID_Continue] 0387 # Po GREEK ANO TELEIA # Total code points: 1 [Canonical_Combining_Class] 0F73 # Mn TIBETAN VOWEL SIGN II 0F75 # Mn TIBETAN VOWEL SIGN UU 0F81 # Mn TIBETAN VOWEL SIGN REVERSED II # Total code points: 3 [East_Asian_Width] 1FBE # L& GREEK PROSGEGRAMMENI 212A # L& KELVIN SIGN # Total code points: 2 [East_Asian_Width, Pattern_Syntax, Script] 1FEF # Sk GREEK VARIA # Total code points: 1 [East_Asian_Width, Line_Break, Script] 1FFD # Sk GREEK OXIA # Total code points: 1 [East_Asian_Width, Line_Break] 212B # L& ANGSTROM SIGN # Total code points: 1 [Bidi_Mirrored] 2ADC # Sm FORKING # Total code points: 1 [Numeric_Type, Numeric_Value] F96B # Lo CJK COMPATIBILITY IDEOGRAPH-F96B F973 # Lo CJK COMPATIBILITY IDEOGRAPH-F973 F978 # Lo CJK COMPATIBILITY IDEOGRAPH-F978 F9B2 # Lo CJK COMPATIBILITY IDEOGRAPH-F9B2 F9D1 # Lo CJK COMPATIBILITY IDEOGRAPH-F9D1 F9D3 # Lo CJK COMPATIBILITY IDEOGRAPH-F9D3 F9FD # Lo CJK COMPATIBILITY IDEOGRAPH-F9FD 2F890 # Lo CJK COMPATIBILITY IDEOGRAPH-2F890 # Total code points: 8 [Ideographic, Word_Break] FA30..FA6A # Lo [59] CJK COMPATIBILITY IDEOGRAPH-FA30..CJK COMPATIBILITY IDEOGRAPH-FA6A # Total code points: 59