Re: Nicest UTF - string case mapping vs. UTF-8/32

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Fri Dec 03 2004 - 14:07:23 CST

  • Next message: Cristian Secarã: "RE: OpenType vs TrueType (was current version of unicode-font)"

    I feel the need to correct one misperception:

    Lars Kristan wrote:
    > 4.1 - UTF-32 is probably very useful for certain string operations.
    > Changing case for example. You can do it in-place, like you could with
    > ASCII. Perhaps it can even be done in UTF-8, I am not sure. But even if
    > it is possible today, it is definitely not guaranteed that it will
    > always remain so, so one shouldn't rely on it.

    Wrong even for UTF-32. Sharp s (U+00DF) uppercases to two characters, "SS". Other examples of case
    mapping expansion and contraction are in SpecialCasing.txt (one of the UCD files).

    For UTF-8, there are also _simple_ (1:1) case mappings that change the length (e.g., long s [017F]
    to S) while sharp s to SS happens to not change the UTF-8 string length...

    markus

    PS: I wrote UTN #12 :-)

    -- 
    Opinions expressed here may not reflect my company's positions unless otherwise noted.
    


    This archive was generated by hypermail 2.1.5 : Fri Dec 03 2004 - 14:11:46 CST