L2/06-164 Date: Sun, 07 May 2006 From: Mark Davis Subject: Clarification of Annex 8: Detecting Normalization Forms ------ (please use monospace font to view this document) http://www.unicode.org/reports/tr15/#Annex8 describes the use of the Quick_Check properties, and the use of "Stable Code Points" (defined in A8.1). It should be expanded somewhat to clarify a few points to prevent people from making assumptions in their code that will byte them. The section points out that when concatenating two NFC or NFKC strings A and B to produce a normalized result, an implementation can optimize the concatenation. It can find the last Stable code point in the first string, and the first Stable code point in the second string. The implementation only has to normalize the range between those strings to do the concatenation. This can be a significant savings in performance when concatenating large strings. Examples: A B A+B Comment "...a" "¨b..." "...äb..." One need only normalize the last character in A up to the second character in B "...ᄀ" "ᅡb..." "...가b..." B doesn't have to start with a combining mark to require processing "...ᄀ" "ᅡᆨb..." "...각b..." We may have to process the first *2* starters in B We don't actually provide property values for Stable characters, so we leave that as a problem for the reader. Some may use Starter (ccc=0) as a proxy for that; we need to make clear tha that definitely won't work, as we see in example 3 above! Based on the data we provide, one could use QuickCheck=YES, but we don't explain that that is safe; the problem is that we are not absolutely clear on the definition of QuickCheck=YES. If two characters can compose in a particular form, I'll call the first one a Composes With Following (CWF) character and the second a Composes With Previous (CWP) character. A Stable character is thus a starter that is neither CWF nor CWP. (These are probably not the best names -- feedback welcome on rewording.) So it would be sufficient for concatenation to scan backwards in A to find the last Starter character which is not CWP (X), and scan forwards in B to find the first Starter character Y which is not CWP. One can then normalize from (and including) character X to the character just before Y. To enable implementations to use the QuickCheck property to enable this optimization, we should clarify that the QuickCheck property is defined as the following, and clarify how it is used in concatenation. QuickCheck=NO : Disallowed QuickCheck=YES: Starter which is *not* CWP (may be CWF or Stable) QuickCheck=MAYBE: All others: non-Starters, CWF,... [Note: one can optimize further. For scanning backwards in A, it is sufficient to go backwards to the last Starter. That character is safe to start from, since A is normalized: if that last Starter character could have combined with a previous character in A, it would have. If that last Starter is Stable, one can also optimize the concatenation to only reordering the final non-starters in A with the initial non-starters in B. However, in practice both of these tests only narrow the range of characters we need to normalize very little and very seldom, so it is probably not worth the effort.] ======================= BTW, here is a breakdown for NFC for those interested: Starter, CWF-Only: 774 Starter, CWP-Only: 63 Starter, Stable (Neither CWF nor CWP): 1,111,746 Starter, Both CWF&CWP: 0 [Note: Both CWF&CWP is theoretically possible, but we don't have (currently) any characters that have both.] Non-Starter, CWP: 38 Non-Starter, Non-CWP: 376 [Note: CWF is disallowed by definition in Non-Starters] Disallowed: 1,115 And listings: Starter, CWF-Only: 003C..003E # Sm [3] LESS-THAN SIGN..GREATER-THAN SIGN 0041..0050 # L& [16] LATIN CAPITAL LETTER A..LATIN CAPITAL LETTER P 0052..005A # L& [9] LATIN CAPITAL LETTER R..LATIN CAPITAL LETTER Z 0061..0070 # L& [16] LATIN SMALL LETTER A..LATIN SMALL LETTER P 0072..007A # L& [9] LATIN SMALL LETTER R..LATIN SMALL LETTER Z 00A8 # Sk DIAERESIS 00C2 # L& LATIN CAPITAL LETTER A WITH CIRCUMFLEX 00C4..00C7 # L& [4] LATIN CAPITAL LETTER A WITH DIAERESIS..LATIN CAPITAL LETTER C WITH CEDILLA 00CA # L& LATIN CAPITAL LETTER E WITH CIRCUMFLEX 00CF # L& LATIN CAPITAL LETTER I WITH DIAERESIS 00D4..00D6 # L& [3] LATIN CAPITAL LETTER O WITH CIRCUMFLEX..LATIN CAPITAL LETTER O WITH DIAERESIS 00D8 # L& LATIN CAPITAL LETTER O WITH STROKE 00DC # L& LATIN CAPITAL LETTER U WITH DIAERESIS 00E2 # L& LATIN SMALL LETTER A WITH CIRCUMFLEX 00E4..00E7 # L& [4] LATIN SMALL LETTER A WITH DIAERESIS..LATIN SMALL LETTER C WITH CEDILLA 00EA # L& LATIN SMALL LETTER E WITH CIRCUMFLEX 00EF # L& LATIN SMALL LETTER I WITH DIAERESIS 00F4..00F6 # L& [3] LATIN SMALL LETTER O WITH CIRCUMFLEX..LATIN SMALL LETTER O WITH DIAERESIS 00F8 # L& LATIN SMALL LETTER O WITH STROKE 00FC # L& LATIN SMALL LETTER U WITH DIAERESIS 0102..0103 # L& [2] LATIN CAPITAL LETTER A WITH BREVE..LATIN SMALL LETTER A WITH BREVE 0112..0113 # L& [2] LATIN CAPITAL LETTER E WITH MACRON..LATIN SMALL LETTER E WITH MACRON 014C..014D # L& [2] LATIN CAPITAL LETTER O WITH MACRON..LATIN SMALL LETTER O WITH MACRON 015A..015B # L& [2] LATIN CAPITAL LETTER S WITH ACUTE..LATIN SMALL LETTER S WITH ACUTE 0160..0161 # L& [2] LATIN CAPITAL LETTER S WITH CARON..LATIN SMALL LETTER S WITH CARON 0168..016B # L& [4] LATIN CAPITAL LETTER U WITH TILDE..LATIN SMALL LETTER U WITH MACRON 017F # L& LATIN SMALL LETTER LONG S 01A0..01A1 # L& [2] LATIN CAPITAL LETTER O WITH HORN..LATIN SMALL LETTER O WITH HORN 01AF..01B0 # L& [2] LATIN CAPITAL LETTER U WITH HORN..LATIN SMALL LETTER U WITH HORN 01B7 # L& LATIN CAPITAL LETTER EZH 01EA..01EB # L& [2] LATIN CAPITAL LETTER O WITH OGONEK..LATIN SMALL LETTER O WITH OGONEK 0226..0229 # L& [4] LATIN CAPITAL LETTER A WITH DOT ABOVE..LATIN SMALL LETTER E WITH CEDILLA 022E..022F # L& [2] LATIN CAPITAL LETTER O WITH DOT ABOVE..LATIN SMALL LETTER O WITH DOT ABOVE 0292 # L& LATIN SMALL LETTER EZH 0391 # L& GREEK CAPITAL LETTER ALPHA 0395 # L& GREEK CAPITAL LETTER EPSILON 0397 # L& GREEK CAPITAL LETTER ETA 0399 # L& GREEK CAPITAL LETTER IOTA 039F # L& GREEK CAPITAL LETTER OMICRON 03A1 # L& GREEK CAPITAL LETTER RHO 03A5 # L& GREEK CAPITAL LETTER UPSILON 03A9 # L& GREEK CAPITAL LETTER OMEGA 03AC # L& GREEK SMALL LETTER ALPHA WITH TONOS 03AE # L& GREEK SMALL LETTER ETA WITH TONOS 03B1 # L& GREEK SMALL LETTER ALPHA 03B5 # L& GREEK SMALL LETTER EPSILON 03B7 # L& GREEK SMALL LETTER ETA 03B9 # L& GREEK SMALL LETTER IOTA 03BF # L& GREEK SMALL LETTER OMICRON 03C1 # L& GREEK SMALL LETTER RHO 03C5 # L& GREEK SMALL LETTER UPSILON 03C9..03CB # L& [3] GREEK SMALL LETTER OMEGA..GREEK SMALL LETTER UPSILON WITH DIALYTIKA 03CE # L& GREEK SMALL LETTER OMEGA WITH TONOS 03D2 # L& GREEK UPSILON WITH HOOK SYMBOL 0406 # L& CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I 0410 # L& CYRILLIC CAPITAL LETTER A 0413 # L& CYRILLIC CAPITAL LETTER GHE 0415..0418 # L& [4] CYRILLIC CAPITAL LETTER IE..CYRILLIC CAPITAL LETTER I 041A # L& CYRILLIC CAPITAL LETTER KA 041E # L& CYRILLIC CAPITAL LETTER O 0423 # L& CYRILLIC CAPITAL LETTER U 0427 # L& CYRILLIC CAPITAL LETTER CHE 042B # L& CYRILLIC CAPITAL LETTER YERU 042D # L& CYRILLIC CAPITAL LETTER E 0430 # L& CYRILLIC SMALL LETTER A 0433 # L& CYRILLIC SMALL LETTER GHE 0435..0438 # L& [4] CYRILLIC SMALL LETTER IE..CYRILLIC SMALL LETTER I 043A # L& CYRILLIC SMALL LETTER KA 043E # L& CYRILLIC SMALL LETTER O 0443 # L& CYRILLIC SMALL LETTER U 0447 # L& CYRILLIC SMALL LETTER CHE 044B # L& CYRILLIC SMALL LETTER YERU 044D # L& CYRILLIC SMALL LETTER E 0456 # L& CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I 0474..0475 # L& [2] CYRILLIC CAPITAL LETTER IZHITSA..CYRILLIC SMALL LETTER IZHITSA 04D8..04D9 # L& [2] CYRILLIC CAPITAL LETTER SCHWA..CYRILLIC SMALL LETTER SCHWA 04E8..04E9 # L& [2] CYRILLIC CAPITAL LETTER BARRED O..CYRILLIC SMALL LETTER BARRED O 0627 # Lo ARABIC LETTER ALEF 0648 # Lo ARABIC LETTER WAW 064A # Lo ARABIC LETTER YEH 06C1 # Lo ARABIC LETTER HEH GOAL 06D2 # Lo ARABIC LETTER YEH BARREE 06D5 # Lo ARABIC LETTER AE 0928 # Lo DEVANAGARI LETTER NA 0930 # Lo DEVANAGARI LETTER RA 0933 # Lo DEVANAGARI LETTER LLA 09C7 # Mc BENGALI VOWEL SIGN E 0B47 # Mc ORIYA VOWEL SIGN E 0B92 # Lo TAMIL LETTER O 0BC6..0BC7 # Mc [2] TAMIL VOWEL SIGN E..TAMIL VOWEL SIGN EE 0C46 # Mn TELUGU VOWEL SIGN E 0CBF # Mn KANNADA VOWEL SIGN I 0CC6 # Mn KANNADA VOWEL SIGN E 0CCA # Mc KANNADA VOWEL SIGN O 0D46..0D47 # Mc [2] MALAYALAM VOWEL SIGN E..MALAYALAM VOWEL SIGN EE 0DD9 # Mc SINHALA VOWEL SIGN KOMBUVA 0DDC # Mc SINHALA VOWEL SIGN KOMBUVA HAA AELA-PILLA 1025 # Lo MYANMAR LETTER U 1100..1112 # Lo [19] HANGUL CHOSEONG KIYEOK..HANGUL CHOSEONG HIEUH 1E36..1E37 # L& [2] LATIN CAPITAL LETTER L WITH DOT BELOW..LATIN SMALL LETTER L WITH DOT BELOW 1E5A..1E5B # L& [2] LATIN CAPITAL LETTER R WITH DOT BELOW..LATIN SMALL LETTER R WITH DOT BELOW 1E62..1E63 # L& [2] LATIN CAPITAL LETTER S WITH DOT BELOW..LATIN SMALL LETTER S WITH DOT BELOW 1EA0..1EA1 # L& [2] LATIN CAPITAL LETTER A WITH DOT BELOW..LATIN SMALL LETTER A WITH DOT BELOW 1EB8..1EB9 # L& [2] LATIN CAPITAL LETTER E WITH DOT BELOW..LATIN SMALL LETTER E WITH DOT BELOW 1ECC..1ECD # L& [2] LATIN CAPITAL LETTER O WITH DOT BELOW..LATIN SMALL LETTER O WITH DOT BELOW 1F00..1F11 # L& [18] GREEK SMALL LETTER ALPHA WITH PSILI..GREEK SMALL LETTER EPSILON WITH DASIA 1F18..1F19 # L& [2] GREEK CAPITAL LETTER EPSILON WITH PSILI..GREEK CAPITAL LETTER EPSILON WITH DASIA 1F20..1F31 # L& [18] GREEK SMALL LETTER ETA WITH PSILI..GREEK SMALL LETTER IOTA WITH DASIA 1F38..1F39 # L& [2] GREEK CAPITAL LETTER IOTA WITH PSILI..GREEK CAPITAL LETTER IOTA WITH DASIA 1F40..1F41 # L& [2] GREEK SMALL LETTER OMICRON WITH PSILI..GREEK SMALL LETTER OMICRON WITH DASIA 1F48..1F49 # L& [2] GREEK CAPITAL LETTER OMICRON WITH PSILI..GREEK CAPITAL LETTER OMICRON WITH DASIA 1F50..1F51 # L& [2] GREEK SMALL LETTER UPSILON WITH PSILI..GREEK SMALL LETTER UPSILON WITH DASIA 1F59 # L& GREEK CAPITAL LETTER UPSILON WITH DASIA 1F60..1F70 # L& [17] GREEK SMALL LETTER OMEGA WITH PSILI..GREEK SMALL LETTER ALPHA WITH VARIA 1F74 # L& GREEK SMALL LETTER ETA WITH VARIA 1F7C # L& GREEK SMALL LETTER OMEGA WITH VARIA 1FB6 # L& GREEK SMALL LETTER ALPHA WITH PERISPOMENI 1FBF # Sk GREEK PSILI 1FC6 # L& GREEK SMALL LETTER ETA WITH PERISPOMENI 1FF6 # L& GREEK SMALL LETTER OMEGA WITH PERISPOMENI 1FFE # Sk GREEK DASIA 2190 # Sm LEFTWARDS ARROW 2192 # Sm RIGHTWARDS ARROW 2194 # Sm LEFT RIGHT ARROW 21D0 # So LEFTWARDS DOUBLE ARROW 21D2 # Sm RIGHTWARDS DOUBLE ARROW 21D4 # Sm LEFT RIGHT DOUBLE ARROW 2203 # Sm THERE EXISTS 2208 # Sm ELEMENT OF 220B # Sm CONTAINS AS MEMBER 2223 # Sm DIVIDES 2225 # Sm PARALLEL TO 223C # Sm TILDE OPERATOR 2243 # Sm ASYMPTOTICALLY EQUAL TO 2245 # Sm APPROXIMATELY EQUAL TO 2248 # Sm ALMOST EQUAL TO 224D # Sm EQUIVALENT TO 2261 # Sm IDENTICAL TO 2264..2265 # Sm [2] LESS-THAN OR EQUAL TO..GREATER-THAN OR EQUAL TO 2272..2273 # Sm [2] LESS-THAN OR EQUIVALENT TO..GREATER-THAN OR EQUIVALENT TO 2276..2277 # Sm [2] LESS-THAN OR GREATER-THAN..GREATER-THAN OR LESS-THAN 227A..227D # Sm [4] PRECEDES..SUCCEEDS OR EQUAL TO 2282..2283 # Sm [2] SUBSET OF..SUPERSET OF 2286..2287 # Sm [2] SUBSET OF OR EQUAL TO..SUPERSET OF OR EQUAL TO 2291..2292 # Sm [2] SQUARE IMAGE OF OR EQUAL TO..SQUARE ORIGINAL OF OR EQUAL TO 22A2 # Sm RIGHT TACK 22A8..22A9 # Sm [2] TRUE..FORCES 22AB # Sm DOUBLE VERTICAL BAR DOUBLE RIGHT TURNSTILE 22B2..22B5 # Sm [4] NORMAL SUBGROUP OF..CONTAINS AS NORMAL SUBGROUP OR EQUAL TO 3046 # Lo HIRAGANA LETTER U 304B # Lo HIRAGANA LETTER KA 304D # Lo HIRAGANA LETTER KI 304F # Lo HIRAGANA LETTER KU 3051 # Lo HIRAGANA LETTER KE 3053 # Lo HIRAGANA LETTER KO 3055 # Lo HIRAGANA LETTER SA 3057 # Lo HIRAGANA LETTER SI 3059 # Lo HIRAGANA LETTER SU 305B # Lo HIRAGANA LETTER SE 305D # Lo HIRAGANA LETTER SO 305F # Lo HIRAGANA LETTER TA 3061 # Lo HIRAGANA LETTER TI 3064 # Lo HIRAGANA LETTER TU 3066 # Lo HIRAGANA LETTER TE 3068 # Lo HIRAGANA LETTER TO 306F # Lo HIRAGANA LETTER HA 3072 # Lo HIRAGANA LETTER HI 3075 # Lo HIRAGANA LETTER HU 3078 # Lo HIRAGANA LETTER HE 307B # Lo HIRAGANA LETTER HO 309D # Lm HIRAGANA ITERATION MARK 30A6 # Lo KATAKANA LETTER U 30AB # Lo KATAKANA LETTER KA 30AD # Lo KATAKANA LETTER KI 30AF # Lo KATAKANA LETTER KU 30B1 # Lo KATAKANA LETTER KE 30B3 # Lo KATAKANA LETTER KO 30B5 # Lo KATAKANA LETTER SA 30B7 # Lo KATAKANA LETTER SI 30B9 # Lo KATAKANA LETTER SU 30BB # Lo KATAKANA LETTER SE 30BD # Lo KATAKANA LETTER SO 30BF # Lo KATAKANA LETTER TA 30C1 # Lo KATAKANA LETTER TI 30C4 # Lo KATAKANA LETTER TU 30C6 # Lo KATAKANA LETTER TE 30C8 # Lo KATAKANA LETTER TO 30CF # Lo KATAKANA LETTER HA 30D2 # Lo KATAKANA LETTER HI 30D5 # Lo KATAKANA LETTER HU 30D8 # Lo KATAKANA LETTER HE 30DB # Lo KATAKANA LETTER HO 30EF..30F2 # Lo [4] KATAKANA LETTER WA..KATAKANA LETTER WO 30FD # Lm KATAKANA ITERATION MARK AC00 # Lo HANGUL SYLLABLE GA [...other Hangul LV syllables...] D788 # Lo HANGUL SYLLABLE HI # Total code points: 774 Starter, CWP-Only: 09BE # Mc BENGALI VOWEL SIGN AA 09D7 # Mc BENGALI AU LENGTH MARK 0B3E # Mc ORIYA VOWEL SIGN AA 0B56 # Mn ORIYA AI LENGTH MARK 0B57 # Mc ORIYA AU LENGTH MARK 0BBE # Mc TAMIL VOWEL SIGN AA 0BD7 # Mc TAMIL AU LENGTH MARK 0CC2 # Mc KANNADA VOWEL SIGN UU 0CD5..0CD6 # Mc [2] KANNADA LENGTH MARK..KANNADA AI LENGTH MARK 0D3E # Mc MALAYALAM VOWEL SIGN AA 0D57 # Mc MALAYALAM AU LENGTH MARK 0DCF # Mc SINHALA VOWEL SIGN AELA-PILLA 0DDF # Mc SINHALA VOWEL SIGN GAYANUKITTA 102E # Mn MYANMAR VOWEL SIGN II 1161..1175 # Lo [21] HANGUL JUNGSEONG A..HANGUL JUNGSEONG I 11A8..11C2 # Lo [27] HANGUL JONGSEONG KIYEOK..HANGUL JONGSEONG HIEUH # Total code points: 63 Starter, Stable: [large list, omitted] # Total code points: 1111746 Starter, Both: # Total code points: 0 Non-Starter, CWP: 0300..0304 # Mn [5] COMBINING GRAVE ACCENT..COMBINING MACRON 0306..030C # Mn [7] COMBINING BREVE..COMBINING CARON 030F # Mn COMBINING DOUBLE GRAVE ACCENT 0311 # Mn COMBINING INVERTED BREVE 0313..0314 # Mn [2] COMBINING COMMA ABOVE..COMBINING REVERSED COMMA ABOVE 031B # Mn COMBINING HORN 0323..0328 # Mn [6] COMBINING DOT BELOW..COMBINING OGONEK 032D..032E # Mn [2] COMBINING CIRCUMFLEX ACCENT BELOW..COMBINING BREVE BELOW 0330..0331 # Mn [2] COMBINING TILDE BELOW..COMBINING MACRON BELOW 0338 # Mn COMBINING LONG SOLIDUS OVERLAY 0342 # Mn COMBINING GREEK PERISPOMENI 0345 # Mn COMBINING GREEK YPOGEGRAMMENI 0653..0655 # Mn [3] ARABIC MADDAH ABOVE..ARABIC HAMZA BELOW 093C # Mn DEVANAGARI SIGN NUKTA 0C56 # Mn TELUGU AI LENGTH MARK 0DCA # Mn SINHALA SIGN AL-LAKUNA 3099..309A # Mn [2] COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK..COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK # Total code points: 38 Non-Starter, Non-CWP: 0305 # Mn COMBINING OVERLINE 030D..030E # Mn [2] COMBINING VERTICAL LINE ABOVE..COMBINING DOUBLE VERTICAL LINE ABOVE 0310 # Mn COMBINING CANDRABINDU 0312 # Mn COMBINING TURNED COMMA ABOVE 0315..031A # Mn [6] COMBINING COMMA ABOVE RIGHT..COMBINING LEFT ANGLE ABOVE 031C..0322 # Mn [7] COMBINING LEFT HALF RING BELOW..COMBINING RETROFLEX HOOK BELOW 0329..032C # Mn [4] COMBINING VERTICAL LINE BELOW..COMBINING CARON BELOW 032F # Mn COMBINING INVERTED BREVE BELOW 0332..0337 # Mn [6] COMBINING LOW LINE..COMBINING SHORT SOLIDUS OVERLAY 0339..033F # Mn [7] COMBINING RIGHT HALF RING BELOW..COMBINING DOUBLE OVERLINE 0346..034E # Mn [9] COMBINING BRIDGE ABOVE..COMBINING UPWARDS ARROW BELOW 0350..036F # Mn [32] COMBINING RIGHT ARROWHEAD ABOVE..COMBINING LATIN SMALL LETTER X 0483..0486 # Mn [4] COMBINING CYRILLIC TITLO..COMBINING CYRILLIC PSILI PNEUMATA 0591..05BD # Mn [45] HEBREW ACCENT ETNAHTA..HEBREW POINT METEG 05BF # Mn HEBREW POINT RAFE 05C1..05C2 # Mn [2] HEBREW POINT SHIN DOT..HEBREW POINT SIN DOT 05C4..05C5 # Mn [2] HEBREW MARK UPPER DOT..HEBREW MARK LOWER DOT 05C7 # Mn HEBREW POINT QAMATS QATAN 0610..0615 # Mn [6] ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARABIC SMALL HIGH TAH 064B..0652 # Mn [8] ARABIC FATHATAN..ARABIC SUKUN 0656..065E # Mn [9] ARABIC SUBSCRIPT ALEF..ARABIC FATHA WITH TWO DOTS 0670 # Mn ARABIC LETTER SUPERSCRIPT ALEF 06D6..06DC # Mn [7] ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA..ARABIC SMALL HIGH SEEN 06DF..06E4 # Mn [6] ARABIC SMALL HIGH ROUNDED ZERO..ARABIC SMALL HIGH MADDA 06E7..06E8 # Mn [2] ARABIC SMALL HIGH YEH..ARABIC SMALL HIGH NOON 06EA..06ED # Mn [4] ARABIC EMPTY CENTRE LOW STOP..ARABIC SMALL LOW MEEM 0711 # Mn SYRIAC LETTER SUPERSCRIPT ALAPH 0730..074A # Mn [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH 07EB..07F3 # Mn [9] NKO COMBINING SHORT HIGH TONE..NKO COMBINING DOUBLE DOT ABOVE 094D # Mn DEVANAGARI SIGN VIRAMA 0951..0954 # Mn [4] DEVANAGARI STRESS SIGN UDATTA..DEVANAGARI ACUTE ACCENT 09BC # Mn BENGALI SIGN NUKTA 09CD # Mn BENGALI SIGN VIRAMA 0A3C # Mn GURMUKHI SIGN NUKTA 0A4D # Mn GURMUKHI SIGN VIRAMA 0ABC # Mn GUJARATI SIGN NUKTA 0ACD # Mn GUJARATI SIGN VIRAMA 0B3C # Mn ORIYA SIGN NUKTA 0B4D # Mn ORIYA SIGN VIRAMA 0BCD # Mn TAMIL SIGN VIRAMA 0C4D # Mn TELUGU SIGN VIRAMA 0C55 # Mn TELUGU LENGTH MARK 0CBC # Mn KANNADA SIGN NUKTA 0CCD # Mn KANNADA SIGN VIRAMA 0D4D # Mn MALAYALAM SIGN VIRAMA 0E38..0E3A # Mn [3] THAI CHARACTER SARA U..THAI CHARACTER PHINTHU 0E48..0E4B # Mn [4] THAI CHARACTER MAI EK..THAI CHARACTER MAI CHATTAWA 0EB8..0EB9 # Mn [2] LAO VOWEL SIGN U..LAO VOWEL SIGN UU 0EC8..0ECB # Mn [4] LAO TONE MAI EK..LAO TONE MAI CATAWA 0F18..0F19 # Mn [2] TIBETAN ASTROLOGICAL SIGN -KHYUD PA..TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS 0F35 # Mn TIBETAN MARK NGAS BZUNG NYI ZLA 0F37 # Mn TIBETAN MARK NGAS BZUNG SGOR RTAGS 0F39 # Mn TIBETAN MARK TSA -PHRU 0F71..0F72 # Mn [2] TIBETAN VOWEL SIGN AA..TIBETAN VOWEL SIGN I 0F74 # Mn TIBETAN VOWEL SIGN U 0F7A..0F7D # Mn [4] TIBETAN VOWEL SIGN E..TIBETAN VOWEL SIGN OO 0F80 # Mn TIBETAN VOWEL SIGN REVERSED I 0F82..0F84 # Mn [3] TIBETAN SIGN NYI ZLA NAA DA..TIBETAN MARK HALANTA 0F86..0F87 # Mn [2] TIBETAN SIGN LCI RTAGS..TIBETAN SIGN YANG RTAGS 0FC6 # Mn TIBETAN SYMBOL PADMA GDAN 1037 # Mn MYANMAR SIGN DOT BELOW 1039 # Mn MYANMAR SIGN VIRAMA 135F # Mn ETHIOPIC COMBINING GEMINATION MARK 1714 # Mn TAGALOG SIGN VIRAMA 1734 # Mn HANUNOO SIGN PAMUDPOD 17D2 # Mn KHMER SIGN COENG 17DD # Mn KHMER SIGN ATTHACAN 18A9 # Mn MONGOLIAN LETTER ALI GALI DAGALGA 1939..193B # Mn [3] LIMBU SIGN MUKPHRENG..LIMBU SIGN SA-I 1A17..1A18 # Mn [2] BUGINESE VOWEL SIGN I..BUGINESE VOWEL SIGN U 1B34 # Mn BALINESE SIGN REREKAN 1B44 # Mc BALINESE ADEG ADEG 1B6B..1B73 # Mn [9] BALINESE MUSICAL SYMBOL COMBINING TEGEH..BALINESE MUSICAL SYMBOL COMBINING GONG 1DC0..1DCA # Mn [11] COMBINING DOTTED GRAVE ACCENT..COMBINING LATIN SMALL LETTER R BELOW 1DFE..1DFF # Mn [2] COMBINING LEFT ARROWHEAD ABOVE..COMBINING RIGHT ARROWHEAD AND DOWN ARROWHEAD BELOW 20D0..20DC # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE 20E1 # Mn COMBINING LEFT RIGHT ARROW ABOVE 20E5..20EF # Mn [11] COMBINING REVERSE SOLIDUS OVERLAY..COMBINING RIGHT ARROW BELOW 302A..302F # Mn [6] IDEOGRAPHIC LEVEL TONE MARK..HANGUL DOUBLE DOT TONE MARK A806 # Mn SYLOTI NAGRI SIGN HASANTA FB1E # Mn HEBREW POINT JUDEO-SPANISH VARIKA FE20..FE23 # Mn [4] COMBINING LIGATURE LEFT HALF..COMBINING DOUBLE TILDE RIGHT HALF 10A0D # Mn KHAROSHTHI SIGN DOUBLE RING BELOW 10A0F # Mn KHAROSHTHI SIGN VISARGA 10A38..10A3A # Mn [3] KHAROSHTHI SIGN BAR ABOVE..KHAROSHTHI SIGN DOT BELOW 10A3F # Mn KHAROSHTHI VIRAMA 1D165..1D166 # Mc [2] MUSICAL SYMBOL COMBINING STEM..MUSICAL SYMBOL COMBINING SPRECHGESANG STEM 1D167..1D169 # Mn [3] MUSICAL SYMBOL COMBINING TREMOLO-1..MUSICAL SYMBOL COMBINING TREMOLO-3 1D16D..1D172 # Mc [6] MUSICAL SYMBOL COMBINING AUGMENTATION DOT..MUSICAL SYMBOL COMBINING FLAG-5 1D17B..1D182 # Mn [8] MUSICAL SYMBOL COMBINING ACCENT..MUSICAL SYMBOL COMBINING LOURE 1D185..1D18B # Mn [7] MUSICAL SYMBOL COMBINING DOIT..MUSICAL SYMBOL COMBINING TRIPLE TONGUE 1D1AA..1D1AD # Mn [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO 1D242..1D244 # Mn [3] COMBINING GREEK MUSICAL TRISEME..COMBINING GREEK MUSICAL PENTASEME # Total code points: 376 Disallowed: 0340..0341 # Mn [2] COMBINING GRAVE TONE MARK..COMBINING ACUTE TONE MARK 0343..0344 # Mn [2] COMBINING GREEK KORONIS..COMBINING GREEK DIALYTIKA TONOS 0374 # Sk GREEK NUMERAL SIGN 037E # Po GREEK QUESTION MARK 0387 # Po GREEK ANO TELEIA 0958..095F # Lo [8] DEVANAGARI LETTER QA..DEVANAGARI LETTER YYA 09DC..09DD # Lo [2] BENGALI LETTER RRA..BENGALI LETTER RHA 09DF # Lo BENGALI LETTER YYA 0A33 # Lo GURMUKHI LETTER LLA 0A36 # Lo GURMUKHI LETTER SHA 0A59..0A5B # Lo [3] GURMUKHI LETTER KHHA..GURMUKHI LETTER ZA 0A5E # Lo GURMUKHI LETTER FA 0B5C..0B5D # Lo [2] ORIYA LETTER RRA..ORIYA LETTER RHA 0F43 # Lo TIBETAN LETTER GHA 0F4D # Lo TIBETAN LETTER DDHA 0F52 # Lo TIBETAN LETTER DHA 0F57 # Lo TIBETAN LETTER BHA 0F5C # Lo TIBETAN LETTER DZHA 0F69 # Lo TIBETAN LETTER KSSA 0F73 # Mn TIBETAN VOWEL SIGN II 0F75..0F76 # Mn [2] TIBETAN VOWEL SIGN UU..TIBETAN VOWEL SIGN VOCALIC R 0F78 # Mn TIBETAN VOWEL SIGN VOCALIC L 0F81 # Mn TIBETAN VOWEL SIGN REVERSED II 0F93 # Mn TIBETAN SUBJOINED LETTER GHA 0F9D # Mn TIBETAN SUBJOINED LETTER DDHA 0FA2 # Mn TIBETAN SUBJOINED LETTER DHA 0FA7 # Mn TIBETAN SUBJOINED LETTER BHA 0FAC # Mn TIBETAN SUBJOINED LETTER DZHA 0FB9 # Mn TIBETAN SUBJOINED LETTER KSSA 1F71 # L& GREEK SMALL LETTER ALPHA WITH OXIA 1F73 # L& GREEK SMALL LETTER EPSILON WITH OXIA 1F75 # L& GREEK SMALL LETTER ETA WITH OXIA 1F77 # L& GREEK SMALL LETTER IOTA WITH OXIA 1F79 # L& GREEK SMALL LETTER OMICRON WITH OXIA 1F7B # L& GREEK SMALL LETTER UPSILON WITH OXIA 1F7D # L& GREEK SMALL LETTER OMEGA WITH OXIA 1FBB # L& GREEK CAPITAL LETTER ALPHA WITH OXIA 1FBE # L& GREEK PROSGEGRAMMENI 1FC9 # L& GREEK CAPITAL LETTER EPSILON WITH OXIA 1FCB # L& GREEK CAPITAL LETTER ETA WITH OXIA 1FD3 # L& GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA 1FDB # L& GREEK CAPITAL LETTER IOTA WITH OXIA 1FE3 # L& GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA 1FEB # L& GREEK CAPITAL LETTER UPSILON WITH OXIA 1FEE..1FEF # Sk [2] GREEK DIALYTIKA AND OXIA..GREEK VARIA 1FF9 # L& GREEK CAPITAL LETTER OMICRON WITH OXIA 1FFB # L& GREEK CAPITAL LETTER OMEGA WITH OXIA 1FFD # Sk GREEK OXIA 2000..2001 # Zs [2] EN QUAD..EM QUAD 2126 # L& OHM SIGN 212A..212B # L& [2] KELVIN SIGN..ANGSTROM SIGN 2329 # Ps LEFT-POINTING ANGLE BRACKET 232A # Pe RIGHT-POINTING ANGLE BRACKET 2ADC # Sm FORKING F900..FA0D # Lo [270] CJK COMPATIBILITY IDEOGRAPH-F900..CJK COMPATIBILITY IDEOGRAPH-FA0D FA10 # Lo CJK COMPATIBILITY IDEOGRAPH-FA10 FA12 # Lo CJK COMPATIBILITY IDEOGRAPH-FA12 FA15..FA1E # Lo [10] CJK COMPATIBILITY IDEOGRAPH-FA15..CJK COMPATIBILITY IDEOGRAPH-FA1E FA20 # Lo CJK COMPATIBILITY IDEOGRAPH-FA20 FA22 # Lo CJK COMPATIBILITY IDEOGRAPH-FA22 FA25..FA26 # Lo [2] CJK COMPATIBILITY IDEOGRAPH-FA25..CJK COMPATIBILITY IDEOGRAPH-FA26 FA2A..FA2D # Lo [4] CJK COMPATIBILITY IDEOGRAPH-FA2A..CJK COMPATIBILITY IDEOGRAPH-FA2D FA30..FA6A # Lo [59] CJK COMPATIBILITY IDEOGRAPH-FA30..CJK COMPATIBILITY IDEOGRAPH-FA6A FA70..FAD9 # Lo [106] CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPATIBILITY IDEOGRAPH-FAD9 FB1D # Lo HEBREW LETTER YOD WITH HIRIQ FB1F # Lo HEBREW LIGATURE YIDDISH YOD YOD PATAH FB2A..FB36 # Lo [13] HEBREW LETTER SHIN WITH SHIN DOT..HEBREW LETTER ZAYIN WITH DAGESH FB38..FB3C # Lo [5] HEBREW LETTER TET WITH DAGESH..HEBREW LETTER LAMED WITH DAGESH FB3E # Lo HEBREW LETTER MEM WITH DAGESH FB40..FB41 # Lo [2] HEBREW LETTER NUN WITH DAGESH..HEBREW LETTER SAMEKH WITH DAGESH FB43..FB44 # Lo [2] HEBREW LETTER FINAL PE WITH DAGESH..HEBREW LETTER PE WITH DAGESH FB46..FB4E # Lo [9] HEBREW LETTER TSADI WITH DAGESH..HEBREW LETTER PE WITH RAFE 1D15E..1D164 # So [7] MUSICAL SYMBOL HALF NOTE..MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE 1D1BB..1D1C0 # So [6] MUSICAL SYMBOL MINIMA..MUSICAL SYMBOL FUSA BLACK 2F800..2FA1D # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D # Total code points: 1115 Done