From: Markus Scherer (email@example.com)
Date: Thu Nov 06 2003 - 15:00:00 EST
I would like to comment on several statements that I have seen in this thread -
- Migrating from UCS-2 to UTF-16:
Doable, and has been done for many applications and libraries.
- Difficult to handle UTF-16?
Use ICU - it handles all of Unicode for collation,
regular expressions, string casing, codepage conversion,
and many other things.
- Support for supplementary characters only for Chinese?
Japan has defined JIS X 0213 which has characters that map to
+ supplementary characters
as well as
+ multiple BMP characters
(ICU 2.8 will support codepage conversion involving
multiple characters on either side)
CJKV ideographs, used in several languages, are driving support
for supplementary characters.
- Case mappings can be modified to return a 32-bit Unicode
code point instead of 16-bit BMP?
This works, but only for "simple" case mappings.
Full Unicode case mappings are defined on strings, and
single-character APIs won't work at all.
Full string mappings map 1:n and are context- and language-sensitive.
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Thu Nov 06 2003 - 15:58:08 EST