RE: Variation selectors and vowel marks

From: Peter Constable (petercon@microsoft.com)
Date: Sat Apr 24 2004 - 09:30:31 EDT

  • Next message: Peter Constable: "RE: Common Locale Data Repository Project"

    > Yes, problems do arise if there is more than one combining character
    > between the base character and the VS and they are not in canonical
    > order. But this is a marginal case which can be avoided by ensuring
    that
    > canonical order is always used.

    If data is always encoded in canonical order, then having a VS within
    the combining mark sequence wouldn't create any normalization problems,
    that's true. But you well know that people do not want their Hebrew data
    in canonical order. Even if they did, it couldn't be guaranteed.

    There's a problem not only in cases of the form B M1 M2 VS, but also in
    cases of the form B M1 VS M2. Of course, the issues are different. The
    first may normalize to B M2 M1 VS; the second perhaps *ought* to
    normalize to B M2 M1 VS, but that won't happen.

    The only way to accommodate VSs within combining mark sequences would be
    to define a set of VSs that pick up their canonical combining class from
    the immediately preceding character. But since VSs can only be used in
    explicitly-specified combinations, it might be less hassle to simply add
    specific variation modifiers for specific combining marks (said
    modifiers being combining characters with the same combining class); but
    if you get to that point, you start to wonder whether adding a new
    combining mark would meet the need just as well without architecting
    entirely new encoding mechanisms.

     
    > An alternative of course would be to define a special VS with the same
    > combining class as the character it applies to, so that the two will
    > always remain together. Thus there would potentially be the need for a
    > considerable set of VSs. But I don't think this is really necessary.

    I think that would be better than having general VSs used with combining
    marks.

    Peter Constable



    This archive was generated by hypermail 2.1.5 : Sat Apr 24 2004 - 10:08:53 EDT