L2/09-353

Source: Mark Davis

Date: October 21, 2009

Subject: Default Ignorable code points in UCA

We got a bug reported to us that a soft hyphen wasn't sorting as ignorable in UCA. I wrote a test, and it turns out that we are failing on the following characters:

For the first 7 (that is, assigned characters), I recommend changing the table for UCA 6.0. For the unassigned characters, we could either add to the table, or change the calculation in http://unicode.org/reports/tr10/#Implicit_Weights.

Bad Assigned Characters: 7

[\u00AD\u115F\u1160\u17B4\u17B5\u3164\uFFA0]

U+00AD SOFT HYPHEN

U+115F HANGUL CHOSEONG FILLER

U+1160 HANGUL JUNGSEONG FILLER

U+17B4 KHMER VOWEL INHERENT AQ

U+17B5 KHMER VOWEL INHERENT AA

U+3164 HANGUL FILLER

U+FFA0 HALFWIDTH HANGUL FILLER

Bad Unassigned Characters: 3773

[\u2065-\u2069\uFFF0-\uFFF8\U000E0000\U000E0002-\U000E001F\U000E0080-\U000E00FF\U000E01F0-\U000E0FFF]