Re: UTF-16 inside UTF-8

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Nov 06 2003 - 15:00:00 EST

Next message: Anto'nio Martins-Tuva'lkin: "Re: Merging combining classes"

Previous message: Peter Constable: "RE: [hebrew] Re: Hebrew composition model, with cantillation marks"
In reply to: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I would like to comment on several statements that I have seen in this thread -

- Migrating from UCS-2 to UTF-16:
Doable, and has been done for many applications and libraries.

- Difficult to handle UTF-16?
   Use ICU - it handles all of Unicode for collation,
   regular expressions, string casing, codepage conversion,
   and many other things.

- Support for supplementary characters only for Chinese?
   Japan has defined JIS X 0213 which has characters that map to
   + supplementary characters
   as well as
   + multiple BMP characters
   (ICU 2.8 will support codepage conversion involving
    multiple characters on either side)

CJKV ideographs, used in several languages, are driving support
for supplementary characters.

- Case mappings can be modified to return a 32-bit Unicode
   code point instead of 16-bit BMP?
   This works, but only for "simple" case mappings.
   Full Unicode case mappings are defined on strings, and
   single-character APIs won't work at all.
   Full string mappings map 1:n and are context- and language-sensitive.

markus

http://oss.software.ibm.com/icu/

-- 
Opinions expressed here may not reflect my company's positions unless otherwise noted.

Next message: Anto'nio Martins-Tuva'lkin: "Re: Merging combining classes"
Previous message: Peter Constable: "RE: [hebrew] Re: Hebrew composition model, with cantillation marks"
In reply to: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Nov 06 2003 - 15:58:08 EST