RE: U+xxxx, U-xxxxxx, and the basics

From: Marco.Cimarosti@icl.com
Date: Wed Mar 08 2000 - 09:49:20 EST


Keld Jørn Simonsen wrote, responding to Mike Brown:

>>In Unicode, each abstract character is mapped to a scalar
>>value in the range 0x0..0x10FFFF. This "Unicode scalar
>>value" uniquely identifies the character. [...]
>Is it so? Last time I looked Unicode characters did not go
>beyond 0xFFFF. Then "surrogates" were defined as
>characters, and two surrogates could be joined to form
something else [...].

I understood that Unicode had extended beyond the 0x0..0xFFFF range. The
fact that no code point is assigned yet in the 0x10000..0x10FFFF range does
not mean that these code points don't exist.

>> * in Unicode notation: U-00212B
>Hmm. normally only 4 hex, U-212B or U212B

My understanding is different.
4 hex digits are associated with the traditional "U+" notation (e.g.
"U+212B"). I understand that the "U-" notation requires 6 digits (e.g.
"U-0021B"), as Mike suggested. I think that the "U" notation serves both
uses.

For we poor non-American, who's 3.0 book is still sailing on a ship at sea,
could editors say a word about this?

>> * in Unicode notation, by its Unicode scalar value: U-010335
>Always 4 or 8 hex in a "U" name.

Again, I think 4 or *6*.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT