Re: Level of Unicode support required for various languages

Date: Wed Oct 31 2007 - 06:53:01 CST

  • Next message: Andrew West: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"

    Quoting Andrew West <>:

    > The glyph components <4E36 6B79 706C>, <4EA0 5915 706C> and <4E36 4E00
    > 5915 706C> are not equivalent to the <4EA0 7CF8> and so none of the
    > IDS sequences with these glyph component sequences should be
    > considered alternate representations of U-22F7A.
    > U+6534 and U+6535 are non-unifiable components, so IDS sequences with
    > 6534 should represent a different character than those sequences with
    > 6535.
    > On the other hand, U+7CF8 amd U+7CF9 are unifiable glyph variants, and
    > therefore which one is used in the IDS sequence is not significant for
    > character matching purposes.
    > And the sequence <2FF1 4E36 4E00> is a decomposition [s.l.] of 4EA0,
    > and so IDS sequences with either <2FF1 4E36 4E00> or 4EA0 are
    > equivalent.

    > I'm not quite sure what the point of the exercise is. We all know that
    > that there may be multiple ways of representing the same character
    > using IDS sequences, but any process that is designed to work with IDS
    > sequences should normalize [s.l.] sequences so that alternate
    > representations are treated as identical, e.g. in this example
    > normalize 7CF9 to 7CF8 (unifiable glyph variants), and normalize <4E36
    > 4E00> to 4EA0 (normalize to the shortest possible sequence).

    I have still to add normalisation to search.cgi (very beta), this will
    be done ... As pointed out above there are different types of
    normalisation issues, decompositional and unifiable. To this one can
    add 2FF2 xyz = 2FF0 x 2FF0 yz = 2FF0 2FF0 xyz [there are of course
    more complicated sequences]


    > Andrew

    This message sent through Virus Free Email

    This archive was generated by hypermail 2.1.5 : Wed Oct 31 2007 - 06:54:42 CST