RE: Astral planes (was: RE: Plane One use, was Re: HTML Validatio n)

From: Rick Cameron (Rick.Cameron@crystaldecisions.com)
Date: Tue Dec 18 2001 - 19:44:35 EST


OK, so it is there in 3.0. But in the section on Surrogates? And on
Transformations? A little obscure.

I expected to find it in section 2.3, for example, where the major encoding
forms are being described; or even earlier - say in 1.1 Coverage. Surely the
range of valid scalar values is an important aspect of coverage!

I hope this aspect of the standard will be front and centre in 4.0.

Thanks

- rick cameron

-----Original Message-----
From: Kenneth Whistler [mailto:kenw@sybase.com]
Sent: Tuesday, 18 December 2001 16:35
To: Rick.Cameron@crystaldecisions.com
Cc: unicode@unicode.org
Subject: RE: Astral planes (was: RE: Plane One use, was Re: HTML Validatio
n)

Rick Cameron asked:

> Are you planning to add an explicit statement to the Unicode standard
> that the valid range for scalar values is 0..10FFFF? (Or is such a
> statement there, and I've just missed it?)

Unicode 3.0, p. 45, D28:

Unicode scalar value: a number N from 0 to 10FFFF<sub>16</sub>...

and p. 46, D29, second bullet:

* Any sequence of code values that would correspond to a scalar value
  greater than 10FFFF<sub>16</sub> is illegal.

>
> In the absence of such a statement, I think it's very easy for people
> to get the idea that the range of scalar values is unbounded above,
> and that any limit is a property of a particular encoding.
>
> In particular, as the use of 32-bit variables to hold Unicode
> characters becomes more common (apparently most unices make wchar_t 32
> bits wide), many will imagine that such a variable represents a 32-bit
> encoding of Unicode, with range 0..FFFFFFFF, where it just happens
> that every value above 10FFFF is unassigned.
>
> I am one such person (but no longer!)
>
> Of course, the Unicode Standard 3.0 doesn't even mention a 32-bit
> encoding - but that's not stopping uniphiles from storing Unicode data
> in their wchar_t's!

It's the Unicode Standard 3.1 that you need to be referring to. UTF-32 was
incorporated into the standard at that point. See

http://www.unicode.org/unicode/reports/tr27/

--Ken



This archive was generated by hypermail 2.1.2 : Tue Dec 18 2001 - 19:38:24 EST