Re: Cyrillic - accented/acuted vowels

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri May 06 2005 - 19:02:04 CDT

  • Next message: Rick McGowan: "UTS #6 SCSU Update released"

    > On 06/05/2005 21:35, Philippe Verdy wrote:
    >
    > > ...
    > >
    > > So a good question is:
    > > Can a "Unicode Named Character Sequence" be recognized as a single
    > > entity, when there are other combining characters in the middle of the
    > > sequence,

    No. The specification should be clear.

    A Unicode Named Character Sequence is a specific sequence of
    code points associated with a name.

    It is not a maximal set of canonically equivalent sequences of
    code points associated with a name.

    > > and when moving those extra combining characters at end of
    > > the named sequence is still canonically equivalent? My opinion is that
    > > such named sequence should still be recognized (due to the canonical
    > > equivalence), to help for interoperability.
    > >
    > I agree,

    And I disagree, because this is not the problem that
    Unicode Named Character Sequences were aimed at.

    > and this would certainly be necessary if Unicode Named
    > Character Sequences are to be defined for Hebrew,

    They are not now, and there is no reason to think that they
    will be in the future.

    > e.g. for such
    > meaningful concepts as HEBREW LETTER SIN WITH DOT and HEBREW LETTER SHIN
    > WITH DOT, because these are commonly combined with other combining
    > characters of lower combining class than SIN DOT and SHIN DOT.

    Such textual elements are already represented using the
    standard, as either:

    <U+05E9, U+05C2>

    or as:

    U+FB2B HEBREW LETTER SHIN WITH SIN DOT

    -- which two are canonically equivalent sequences.

    Creating a name for the first sequence would be pointless, since
    there already *is* a character name for a canonically equivalent
    encoded character. And besides, nobody is requiring formal names to
    be given for every character sequence that might be used -- particularly
    when you start considering for Hebrew all the potential sequences
    that could be involved in Biblical text representation.

    Trying to invent some "meaningful concept" for HEBREW LETTER
    SIN WITH DOT which is different from one of the two representations
    above in some way is just a recipe for *non*-interoperability
    with the standard and implementations of it, rather than
    helping any.

    Or perhaps what you really have in mind is:

    HEBREW LETTER SIN WITH DOT BECAUSE THE UTC SCREWED UP THE
       CANONICAL CLASS ASSIGNMENT OF HEBREW COMBINING MARKS
       
    Would that suffice?

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri May 06 2005 - 19:03:35 CDT