RE: Astral planes (was: RE: Plane One use, was Re: HTML Validatio n)

From: Rick Cameron (Rick.Cameron@crystaldecisions.com)
Date: Tue Dec 18 2001 - 19:16:40 EST


That's great! The situation is much clearer to me now, and I'll revise my
Unicode evangelising accordingly.

Thanks!

- rick cameron

-----Original Message-----
From: Asmus Freytag [mailto:asmusf@ix.netcom.com]
Sent: Tuesday, 18 December 2001 16:22
To: Rick Cameron; unicode@unicode.org
Subject: RE: Astral planes (was: RE: Plane One use, was Re: HTML Validatio
n)

At 03:38 PM 12/18/01 -0800, Rick Cameron wrote:
>Are you planning to add an explicit statement to the Unicode standard
>that the valid range for scalar values is 0..10FFFF? (Or is such a
>statement there, and I've just missed it?)

see below:

>In particular, as the use of 32-bit variables to hold Unicode
>characters becomes more common (apparently most unices make wchar_t 32
>bits wide), many will imagine that such a variable represents a 32-bit
>encoding of Unicode, with range 0..FFFFFFFF, where it just happens that
>every value above 10FFFF is unassigned.
>
>Of course, the Unicode Standard 3.0 doesn't even mention a 32-bit
>encoding - but that's not stopping uniphiles from storing Unicode data
>in their wchar_t's!

The only way such use is conformant is if it follows UTF-32. The latter is
clearly specified in http://www.unicode.org/unicode/reports/tr19/ as:

"The following lists the important features of this encoding form:

UTF-32 is restricted in values to the range 0..10FFFF, which precisely
matches the range of characters defined in the Unicode Standard (and other
standards such as XML), and those representable by UTF-8 and UTF-16. "

And Unicode 3.1 (in http://www.unicode.org/unicode/reports/tr27/) states:

"Status of UTF-32
Unicode Technical Report #19, UTF-32, has been elevated to the status of a
Unicode Standard Annex, making UTF-32 officially a part of the Unicode
Standard.

...

Because UTF-32 is a fixed-width, 32-bit encoding form, the numerical value
of a Unicode character in UTF-32 is always precisely identical to the
Unicode scalar value.

"

When Unicode 4.0 is published, we'll futher clean up the language by not
requiring an external reference to an external UTF-32 document, among other
changes. I'm confident that seeing all the revisions applied to the text of
chapter three, plus our usual editorial tweaks will make it much less
likely to arrive at the misunderstanding that you were having.

A./

Technical Vice President
The Unicode Consortium
Liaison to ISO/IEC JTC1/SC2/WG2



This archive was generated by hypermail 2.1.2 : Tue Dec 18 2001 - 18:40:50 EST