Confusion about composition

From: Clark Cox (
Date: Sat Jan 10 2004 - 11:22:48 EST

  • Next message: Elliotte Rusty Harold: "Re: Chinese rod numerals"

    I'm in the process of writing several normalization routines, and
    testing them against NormalizationTest.txt. The code that I use to do
    the composition for NFC and NFKC seems to work for every line in the
    test file, except for a 21 of them. An example of where my routine
    falls down is with the line:

            1026;1026;1025 102E;1026;1025 102E; # (ဦ; ဦ; ဥ◌ီ; ဦ; ဥ◌ီ; ) MYANMAR

    According to the comment at the beginning of the file, and all that
    I've read elsewhere, toNFC(U+1025 U+102E) should result in U+1026.
    However both U+1025 and U+102E have combining classes of zero, so my
    code does not compose those characters. No information that I've been
    able to find has been able to explain this discrepancy. Any help would
    be greatly appreciated.

    Clark S. Cox III

    This archive was generated by hypermail 2.1.5 : Sat Jan 10 2004 - 11:52:46 EST