Asmus Freytag <email@example.com> wrote:
> There are 0x10FFFF - 34 possible characters!
> All code values ending in 0xFFFE and OxFFFF do *not* refer to
> characters. They are not just temporarily unassigned, but permanently
> reserved as non-characters.
Right, but we should start with 0x110000, not 0x10FFFF (since U+0000
NULL is a perfectly legitimate character), then subtract 34 (U+??FFFE
and U+??FFFF for each of 17 planes), then subtract another 2,048 for
the surrogate codepoints (U+D800 through U+DFFF). That leaves us with
1,112,030 possible characters. There will be a test next period.
Then Robert Lozyniak <firstname.lastname@example.org> wrote:
> Okay, 0x10FFDE different characters. But what of planes 15 and 16?
Planes 15 and 16 are for private-use characters, just like the range
from U+E000 to U+F8FF. These still count as "possible characters."
and then "john" <email@example.com> wrote:
> Clarification request: Does that mean
> None of the code values ending in 0xFFFE and 0xFFFF refer to
> Not all of the code values ending in 0xFFFE and 0xFFFF refer to
> characters (i..e some do and some do not)?
The first one. For all x where ((x & 0x00FFFE) == 0x00FFFE), x is not
a valid character.
BTW, it's interesting that the FAQ claims this is "for no good reason,"
when in fact I can think of a good reason to at least exclude the
characters ending in FFFE: if expressed in UTF-32 little-endian and
appearing at the beginning of a file, they could fool an auto-detection
scheme into thinking the file is UTF-16 big-endian.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT