Re: U+0F81 - Unicode 4.0 normalization error (missing exclusion for "Tibetan Vowel Sign Reversed II")

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon May 12 2003 - 09:53:03 EDT

  • Next message: Markus Scherer: "Re: U+0F81 - Unicode 4.0 normalization error (missing exclusion for "Tibetan Vowel Sign Reversed II")"

    I don't know if you got this message in the list, as I don't see it in the
    archive. So this is a repost...
    Sorry if you already got it...

    ----- Original Message -----
    From: "Philippe Verdy" <verdy_p@wanadoo.fr>
    To: <unicode@unicode.org>
    Cc: "Mark Davis" <mark.davis@us.ibm.com>; "Martin Dürst" <duerst@w3.org>
    Sent: Monday, May 12, 2003 2:29 AM
    Subject: Unicode 4.0 normalization error (missing exclusion for "Tibetan
    Vowel Sign Reversed II")

    > After some tests I have seen that one character defined in the test file
    is
    > excluded from canonical recomposition:
    >
    > This normalization test chart:
    > http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt
    > lists:
    >
    > 0F81; 0F71 0F80; 0F71 0F80; 0F71 0F80; 0F71 0F80 # (◌ཱྀ; ◌ཱ◌ྀ; ◌ཱ◌ྀ; ◌ཱ◌ྀ;
    > ◌ཱ◌ྀ; ) TIBETAN VOWEL SIGN REVERSED II
    >
    > However I don't know why it is not listed in
    > http://www.unicode.org/Public/4.0-Update/CompositionExclusions-4.0.0.txt
    >
    > It should list all the Tibetan decompositions (the others are already
    > included in the normalization tests chart):
    > 0F43; 0F42 0FB7; TIBETAN LETTER GHA
    > 0F4D; 0F4C 0FB7; TIBETAN LETTER DDHA
    > 0F52; 0F51 0FB7; TIBETAN LETTER DHA
    > 0F57; 0F56 0FB7; TIBETAN LETTER BHA
    > 0F5C; 0F5B 0FB7; TIBETAN LETTER DZHA
    > 0F69; 0F40 0FB5; TIBETAN LETTER KSSA
    > 0F73; 0F71 0F72; TIBETAN VOWEL SIGN II
    > 0F75; 0F71 0F74; TIBETAN VOWEL SIGN UU
    > 0F76; 0FB2 0F80; TIBETAN VOWEL SIGN VOCALIC R
    > 0F78; 0FB3 0F80; TIBETAN VOWEL SIGN VOCALIC L
    > 0F81; 0F71 0F80; TIBETAN VOWEL SIGN REVERSED II
    > 0F93; 0F92 0FB7; TIBETAN SUBJOINED LETTER GHA
    > 0F9D; 0F9C 0FB7; TIBETAN SUBJOINED LETTER DDHA
    > 0FA2; 0FA1 0FB7; TIBETAN SUBJOINED LETTER DHA
    > 0FA7; 0FA6 0FB7; TIBETAN SUBJOINED LETTER BHA
    > 0FAC; 0FAB 0FB7; TIBETAN SUBJOINED LETTER DZHA
    > 0FB9; 0F90 0FB5; TIBETAN SUBJOINED LETTER KSSA
    >
    > I think this is an incoherence, and CompositionExclusions-4.0.0.txt needs
    to
    > be corrected to include this character...
    > I did not find a corrigendum for this case.
    >
    > As the UCD and the CompositionExclusions is normative and the composition
    > tests chart is mostly informative, I think this will create bugs depending
    > on which file is used to generate NFC/NFD conversion tables.
    >
    > But the standard also mandates testing the generated normalizer with this
    > test file (in Normative Annex 9 this test is mandated but the test file is
    > described "for convenience"... So we'll have an error for this Tibetan
    > character when testing the normalizer according to the normative UCD and
    > exclusions...
    >
    > This is the only character I found in all the new UCD 4.0.0 that exhibits
    > this problem.
    >
    > This should be corrected while the new standard is in "prepublication"
    > state, before the book is published. If it is already printed, this could
    be
    > done by publishing an online alert before the book is distributed, or by
    > adding a corrigendum sheet in the printed book, because TR15 is extremely
    > critical and now a full part of the standard as UAX#15...
    >
    > --Philippe.
    >



    This archive was generated by hypermail 2.1.5 : Mon May 12 2003 - 10:46:03 EDT