Decomposed vs Composed accented characters

From: Tay, William (William.Tay@xerox.com)
Date: Thu Apr 06 2006 - 11:53:12 CST

  • Next message: Mark Leisher: "Re: Decomposed vs Composed accented characters"

    Hi,

    I have a C/C++ UNIX application that uses standard UTF-8 as the internal
    text encoding. If it receives a UTF-8 encoded decomposed accented
    character, i.e. base character + accent, from a MacOS X application, it
    would need to be able to detect that the character was decomposed, and
    then compose it prior to further processing. Is there any Solaris/UNIX
    utility or functions that can help my application do the detection and
    character composition?

    Now, the application from which the decomposed accented character
    originated may query my application so that the character is returned to
    it. If my application has already composed the character, won't it be a
    problem for the querying application, since it expects to receive the
    character in its decomposed format?

    My application interacts with not only MacOS X application but others
    that sit on different platforms. So, I'm not always receiving accented
    characters in their decomposed format.

    How do you think I should implement my application so that it takes care
    of decomposed and composed UTF-8 characters effectively?

    Can accented characters be decomposed in other encodings, e.g. ISO
    8859-1, as well?

    Btw, what common applications/operating systems generate decomposed
    accented characters?

    Thanks.

    Will



    This archive was generated by hypermail 2.1.5 : Thu Apr 06 2006 - 12:02:32 CST