RE: Implementing on UTF8: toUpper(), toFold(), normalisation, collation, etc

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Sat May 03 2003 - 13:32:52 EDT

  • Next message: Ben Dougall: "Re: Implementing on UTF8: toUpper(), toFold(), normalisation, collation, etc"

    Theodore,

    If you want to implement Unicode support across different platforms I
    suggest that you look at ICU. http://www-124.ibm.com/icu/ I contains a
    full complement of Unicode support. However, since you want to process
    UTF-8 data, I suggest that you create a function wrapper system to handle
    the UTF-8 to UTF-16 transforms as part of your functions. You can use xIUA
    http://www.xnetinc.com/xiua/ as a starting point. It has an interface to
    ICU that allows you to directly handle UTF-8 data with conversion work area
    management. It also implements UTF-8 specific functions such as strtok. If
    this is existing code it will also should you help to manage locale settings
    so that you do not have to change APIs to pass locale information in a
    thread safe manner.

    Carl

    > -----Original Message-----
    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
    > Behalf Of Theodore H. Smith
    > Sent: Saturday, May 03, 2003 8:50 AM
    > To: unicode@unicode.org
    > Subject: Implementing on UTF8: toUpper(), toFold(), normalisation,
    > collation, etc
    >
    >
    > Hi list,
    >
    > I need to implement some way to implement toUpper(), toFold(),
    > normalisation, collation, and perhaps other Unicode features I may have
    > missed out, on UTF8 strings stored in the RAM.
    >
    > I need to implement it for Windows (32-bit), MacOS9 and MacOSX.
    >
    > I have other Unicode processing code, already, but not these or
    > anything close to these.
    >
    > I heard that the only way is to read out the character information from
    > a database? My whole string processing library, with hundreds of
    > functions and a few properties, is only 54k. I don't want to add 200k
    > of database reading code and then huge Unicode database files to this
    > 54k.
    >
    > How is this best done, then? I'm assuming there isn't any mathematical
    > way to figure out a codepoint's properties? So where do I get this data
    > and what's the fastest way to do it?
    >
    > --
    > Theodore H. Smith - Macintosh Consultant / Contractor.
    > My website: <www.elfdata.com/>
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sat May 03 2003 - 14:19:51 EDT