Re: Decomposed vs Composed accented characters

From: Mark Leisher (mleisher@crl.nmsu.edu)
Date: Thu Apr 06 2006 - 14:04:09 CST

  • Next message: Richard Wordingham: "Re: The Phaistos Disc"

    Tay, William wrote:
    > Hi,
    >
    > I have a C/C++ UNIX application that uses standard UTF-8 as the internal
    > text encoding. If it receives a UTF-8 encoded decomposed accented
    > character, i.e. base character + accent, from a MacOS X application, it
    > would need to be able to detect that the character was decomposed, and
    > then compose it prior to further processing. Is there any Solaris/UNIX
    > utility or functions that can help my application do the detection and
    > character composition?
    >
    > Now, the application from which the decomposed accented character
    > originated may query my application so that the character is returned to
    > it. If my application has already composed the character, won't it be a
    > problem for the querying application, since it expects to receive the
    > character in its decomposed format?
    >
    > My application interacts with not only MacOS X application but others
    > that sit on different platforms. So, I'm not always receiving accented
    > characters in their decomposed format.
    >
    > How do you think I should implement my application so that it takes care
    > of decomposed and composed UTF-8 characters effectively?
    >
    > Can accented characters be decomposed in other encodings, e.g. ISO
    > 8859-1, as well?
    >
    > Btw, what common applications/operating systems generate decomposed
    > accented characters?
    >

    You can play with http://crl.nmsu.edu/~mleisher/ucdata.html. Version 2.9
    does not have composition/decomposition for UTF-8 strings, but version
    3.0 will be released soon (probably next few weeks), and it does have
    support for UTF-8 composition/decomposition.

    -- 
    ------------------------------------------------------------------------
    Mark Leisher
    Computing Research Lab              They never open their mouths
    New Mexico State University         without subtracting from the
    Box 30001, MSC 3CRL                 sum of human knowledge.
    Las Cruces, NM  88003                 -- Thomas Bracket Reed (1839-1902)
    


    This archive was generated by hypermail 2.1.5 : Thu Apr 06 2006 - 14:17:42 CST