Re: U+xxxx, U-xxxxxx, and the basics

From: Keld Jørn Simonsen (
Date: Wed Mar 08 2000 - 10:32:41 EST

On Wed, Mar 08, 2000 at 06:44:52AM -0800, wrote:
> Keld Jørn Simonsen wrote, responding to Mike Brown:
> >>In Unicode, each abstract character is mapped to a scalar
> >>value in the range 0x0..0x10FFFF. This "Unicode scalar
> >>value" uniquely identifies the character. [...]
> >Is it so? Last time I looked Unicode characters did not go
> >beyond 0xFFFF. Then "surrogates" were defined as
> >characters, and two surrogates could be joined to form
> something else [...].
> I understood that Unicode had extended beyond the 0x0..0xFFFF range. The
> fact that no code point is assigned yet in the 0x10000..0x10FFFF range does
> not mean that these code points don't exist.

Yes, but my last reading was that surrogates are characters.
Maybe it was changed with 3.0

> >> * in Unicode notation: U-00212B
> >Hmm. normally only 4 hex, U-212B or U212B
> My understanding is different.
> 4 hex digits are associated with the traditional "U+" notation (e.g.
> "U+212B"). I understand that the "U-" notation requires 6 digits (e.g.
> "U-0021B"), as Mike suggested. I think that the "U" notation serves both
> uses.
> >> * in Unicode notation, by its Unicode scalar value: U-010335
> >Always 4 or 8 hex in a "U" name.
> Again, I think 4 or *6*.

Possibly, I dont have Unicode 3.0 at hand.
Anyway that would again differ from ISO 10646, which clearly says 4 or 8 hex digits.
Some text should be made to that fact.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT