Re: Differences between UnicodeData.txt and SpecialCasing.txt Case Mappings

From: Addison Phillips (addison@yahoo-inc.com)
Date: Thu Oct 19 2006 - 18:05:32 CST

  • Next message: Philippe Verdy: "Re: Differences between UnicodeData.txt and SpecialCasing.txt Case Mappings"

    Hi Andrew,

    Andrew Miller wrote:
    > There appear to be a number of differences in the case mappings defined
    > in UnicodeData.txt and SpecialCasing.txt

    This is as it should be. Right at the top of the file it says:

    # This file is a supplement to the UnicodeData file.
    # It contains additional information about the casing of Unicode characters.
    # (For compatibility, the UnicodeData.txt file only contains case
    mappings for
    # characters where they are 1-1, and does not have locale-specific
    mappings.)
    # For more information, see the discussion of Case Mappings in the
    Unicode Standard.

    In other words, this is where you will find every instance of case
    mappings that consume a larger number of code points than the source text.

    >
    > For example, U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE) has a
    > lowercase mapping of U+0069 in UnicodeData.txt and a mapping of U+0069
    > U+0307 in SpecialCasing.txt.
    >
    > All of the greek YPOGEGRAMMENI letters in SpecialCasing.txt have
    > different uppercase mappings to those specified in UnicodeData.txt
    >
    > Can I just ignore the UnicodeData.txt mappings for these characters, and
    > just use the ones defined in SpecialCasing ones instead?
    >

    Not entirely, you can't. The bottom part of the file contains
    locale-specific mappings. These are mappings that should be used in
    specific languages/locales and not elsewhere. For example:

    # When uppercasing, i turns into a dotted capital I

    0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I
    0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I

    You wouldn't want the letter "i" to become İ (U+0130) under "normal"
    (i.e. non-Turkish/non-Azerbaijani) circumstances.

    Hope that helps.

    Addison

    -- 
    Addison Phillips
    Globalization Architect -- Yahoo! Inc.
    Internationalization is an architecture.
    It is not a feature.
    


    This archive was generated by hypermail 2.1.5 : Thu Oct 19 2006 - 18:07:08 CST