Re: Named character sequneces and canonical equivalence, was: Cyrillic - accented/acuted vowels

From: Markus Scherer (markus.icu@gmail.com)
Date: Tue May 10 2005 - 12:36:51 CDT

  • Next message: Mark Davis: "Re: Collating nonconjunct and conjunct forms of words"

    On 5/9/05, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
    > So why standardizing named character sequences, if they don't have their own
    > semantic in other related standards or mapping tables where they HAVE a
    > semantic?

    They have the semantics that their sequences of code points have
    already. They are useful for someone who is looking for how to express
    some "user character" in Unicode when that requires more than one code
    point.

    > ... almost
    > everybody will ignore them...

    I think that's fine. If you know how to express your text as a
    sequence of code points, or if you have a keyboard or input method
    that perform well for you, you may not need Named Character Sequences.
    Not every user of Unicode needs every property and companion standard
    and technical report and what all comes with it.

    > ... notably because they already are not stable
    > under normalization).

    Ä=A-umlaut (U+00C4) is not "stable under normalization" either, but I
    know a few people who are happy to use it anyway.

    > These should have consequences too when implementing collation...

    Most of the time, sorting of strings can be properly implemented by
    defining sorting of single characters. Sometimes, you need to add a
    contraction, but not every meaningful sequence needs or should have a
    contraction. UCA, and UCA-based tailorings, use contractions sparingly
    but do use them when necessary.

    markus

    -- 
    Opinions expressed here may not reflect my company's positions unless
    otherwise noted.
    


    This archive was generated by hypermail 2.1.5 : Tue May 10 2005 - 12:37:54 CDT