RE: Implementing on UTF8: toUpper(), toFold(), normalisation, collation, etc

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Sat May 03 2003 - 13:32:52 EDT

Next message: Ben Dougall: "Re: Implementing on UTF8: toUpper(), toFold(), normalisation, collation, etc"

Previous message: Addison Phillips [wM]: "Re: Implementing on UTF8: toUpper(), toFold(), normalisation, collation, etc"
In reply to: Theodore H. Smith: "Implementing on UTF8: toUpper(), toFold(), normalisation, collation, etc"
Next in thread: Ben Dougall: "Re: Implementing on UTF8: toUpper(), toFold(), normalisation, collation, etc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Theodore,

If you want to implement Unicode support across different platforms I
suggest that you look at ICU. http://www-124.ibm.com/icu/ I contains a
full complement of Unicode support. However, since you want to process
UTF-8 data, I suggest that you create a function wrapper system to handle
the UTF-8 to UTF-16 transforms as part of your functions. You can use xIUA
http://www.xnetinc.com/xiua/ as a starting point. It has an interface to
ICU that allows you to directly handle UTF-8 data with conversion work area
management. It also implements UTF-8 specific functions such as strtok. If
this is existing code it will also should you help to manage locale settings
so that you do not have to change APIs to pass locale information in a
thread safe manner.

Carl

> -----Original Message-----
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
> Behalf Of Theodore H. Smith
> Sent: Saturday, May 03, 2003 8:50 AM
> To: unicode@unicode.org
> Subject: Implementing on UTF8: toUpper(), toFold(), normalisation,
> collation, etc
>
>
> Hi list,
>
> I need to implement some way to implement toUpper(), toFold(),
> normalisation, collation, and perhaps other Unicode features I may have
> missed out, on UTF8 strings stored in the RAM.
>
> I need to implement it for Windows (32-bit), MacOS9 and MacOSX.
>
> I have other Unicode processing code, already, but not these or
> anything close to these.
>
> I heard that the only way is to read out the character information from
> a database? My whole string processing library, with hundreds of
> functions and a few properties, is only 54k. I don't want to add 200k
> of database reading code and then huge Unicode database files to this
> 54k.
>
> How is this best done, then? I'm assuming there isn't any mathematical
> way to figure out a codepoint's properties? So where do I get this data
> and what's the fastest way to do it?
>
> --
> Theodore H. Smith - Macintosh Consultant / Contractor.
> My website: <www.elfdata.com/>
>
>
>

Next message: Ben Dougall: "Re: Implementing on UTF8: toUpper(), toFold(), normalisation, collation, etc"
Previous message: Addison Phillips [wM]: "Re: Implementing on UTF8: toUpper(), toFold(), normalisation, collation, etc"
In reply to: Theodore H. Smith: "Implementing on UTF8: toUpper(), toFold(), normalisation, collation, etc"
Next in thread: Ben Dougall: "Re: Implementing on UTF8: toUpper(), toFold(), normalisation, collation, etc"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat May 03 2003 - 14:19:51 EDT