Re: U+xxxx, U-xxxxxx, and the basics

From: Addison Phillips [GSC] (addison@globalsight.com)
Date: Fri Mar 03 2000 - 21:34:53 EST


Hi Mike,

I detect an error in:

>The first half of the pair is
>always in the 0xD000..0xD7FF range, and the second half of the pair is in
>the 0x0..0xFFFF range. Unicode 3.0 and ISO/IEC 10646-1;2000 have adopted
the
>UTF-16 mechanism as the only official usage of the 0xD000..0xD7FF scalar
>range.

Actually, surrogate pairs are better designed than that. In UTF-16, no
16-bit value is "overloaded" or used for anything other than itself. That
is, 0x212B is always ANGSTROM and never anything but ANGSTROM. It is *never*
half of a surrogate pair (in the same way that no byte in a UTF-8 character
is in-and-of-itself a character in UTF-8).

So, lead surrogate values are in the range: 0xD800 -> 0xDB7F
--> Note the disjoint for high-private use surrogates
Trailing surrogate values are in the range: 0xDC00 -> 0xDFFF

U+D000 -> U+D7FF are all part of the Hangul syllable range.

Thanks,

Addison

Addison P. Phillips
Senior Globalization Consultant
Global Sight Corporation
mailto:addison@globalsight.com
================================
101 Metro Drive, Suite 750
San Jose, California 95110
(+1) 408.350.3600 - Telephone
(+1) 408.350.3601 - Fax
http://www.globalsight.com
================================

Red Herring names Global Sight among the 1999 "Ten to Watch" in its annual
roundup of the top 100 companies of the electronic economy. Read more at:
http://www.redherring.com/mag/issue67/news-feature-du99-global.html

Going global with your web site? Global Sight provides Web-based software
solutions that simplify the process, cut costs, and save time.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT