Re: Unicode Normalisaton Optimisation Experiments

From: Peter Kirk (peterkirk@qaya.org)
Date: Thu Sep 25 2003 - 06:08:47 EDT

  • Next message: jon@spin.ie: "Re: need help understanding diacritical encoding"

    On 24/09/2003 14:58, Jon Hanna wrote:

    >... For example since following the decomposition <U+0104> -> <U+0041, U+0328> there can be no character that is unblocked from the U+0041 that will combine with it, hence there is no circumstance in which they will not be recombined to U+0104 and hence dropping that decomposition from the data will not affect NFC (the relevant data would still have to be in the composition table, as the sequence <U+0041, U+0328> might occur in the source code).
    >
    >
    >
    Is this actually correct? For example, if I have in my data the string
    <U+0104, U+05B0> (which I know is garbage, but that is irrelevant), that
    will decompose and reorder to <U+0041, U+05B0, U+0328>, as U+05B0 has a
    higher combining class (202) than U+05B0 (10). What does this become in
    NFC? Is the reordering reversed and the combination reapplied?

    This is not only a theoretical issue as the same applies to some real
    combinations. There was discussion only last week on the bidi list of a
    form which might be encoded <U+064A, U+0652, U+0654> but which would be
    messed up if composed into <U+0626, U+0652>.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Thu Sep 25 2003 - 06:58:03 EDT