Re: Problem with accented characters

From: Deborah Goldsmith (goldsmit@apple.com)
Date: Mon Aug 23 2004 - 14:53:33 CDT

  • Next message: Doug Ewell: "Re: Problem with accented characters"

    FYI, by far the largest source of text in NFD (decomposed) form in Mac
    OS X is the file system. File names are stored this way (for historical
    reasons), so anything copied from a file name is in (a slightly altered
    form of) NFD.

    Also, a few keyboard layouts generate text that is partly decomposed,
    for ease of typing (e.g., Vietnamese).

    Deborah Goldsmith
    Internationalization, Unicode liaison
    Apple Computer, Inc.
    goldsmit@apple.com

    On Aug 23, 2004, at 11:51 AM, Doug Ewell wrote:

    > Problem with accented charactersWilliam Tay wrote:
    >
    >> Can anyone explain why an accented character is sometimes represented
    >> as a base character plus its accent? For example, the utf-8
    >> representation for é is 65 CC 81, which is the utf-8 representation
    >> for e and the accent, instead of C3 A9? I find that this is how MacOS
    >> X represents accented characters.
    >
    > The two characters U+0065 and U+0301 (é) are canonically equivalent to
    > the single character U+00E9 (é). That is, the two-character combining
    > sequence is supposed to be considered equivalent to the single
    > precomposed character. Apparently MacOS X, or at least one application
    > running under it, does use the combining sequence.
    >
    >> How can a C application that receives such utf-8 encoded characters
    >> handle them correctly? Appreciate your comments.
    >
    > It must understand normalization. See TUS 4.0, section 5.6 for more
    > information.
    >
    > -Doug Ewell
    > Fullerton, California
    > http://users.adelphia.net/~dewell/
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Aug 23 2004 - 14:55:20 CDT