Re: Looking for a C library that converts UTF-8 strings from their decomposed to pre-composed form

From: Deborah Goldsmith (
Date: Mon Nov 08 2004 - 20:57:04 CST

  • Next message: Joe: "RE: About Encoding Theory (was: Re: Again not about Phoenician)"

    I think he's saying he wants to convert to NFC *from* Mac OS X data, in
    which case the fact that Mac OS X's file system normalization is not
    strict NFD doesn't really matter. Also, he says he's running on
    Solaris, which would make it a tad difficult to call a Mac OS X API.
    ICU should do the trick.

    It's worth pointing out that there is no such thing as "precomposed
    Unicode". Normalization form C (NFC) could be called "as precomposed as
    possible." There are some sequences of Unicode that can only be
    expressed using combining marks.

    Deborah Goldsmith
    Internationalization, Unicode liaison
    Apple Computer, Inc.

    On Nov 8, 2004, at 5:17 PM, Markus Scherer wrote:

    > Tay, William wrote:
    >> Is there any C library available that converts the decomposed UTF-8
    >> byte
    >> streams into the pre-composed equivalent?
    > MacOS X does decompose filenames, but it does not use standard Unicode
    > normalization (because it was
    > designed before Unicode's normalization was finalized.) I suggest you
    > search the mailing list
    > archive for this list for more details. You probably need to use a
    > MacOS system function.
    > ICU has options for normalization (some defined with internal
    > constants only) which may or may not
    > match, or get close to, MacOS filename normalization:
    > markus

    This archive was generated by hypermail 2.1.5 : Mon Nov 08 2004 - 20:58:33 CST