Re: Timetables and conventions (was RE: Chapter on character sets)

From: Antoine Leca (
Date: Fri Jun 16 2000 - 13:52:03 EDT

Kenneth Whistler wrote:
> The same conventions will be used for citation of characters in Planes
> above Plane 0 in Unicode Technical Reports and in the eventual republication
> of the standard itself. In textual citations, the normal usage will
> include the "U+" prefix: U+1D141, etc.

Ah, that is new!

It was my understanding that we do not use U+5F (_), U+410 (Cyrillic A), etc.

The U+ notation is carefully described in ISO 10646 (so I think), and
I remember reading that U+xxxx is the same as U-0000xxxx (which means
that there is a relationship between UCS-2 and UCS-4), so I expected
U-0001D141 instead.



> Parsers of the Unicode Character Database files
> will have to be modified if they have built-in assumptions that
> character values are always 4-digit hex values. Now they should be
> extended to allow for 6-digit hex values in the data files, and they
> should be prepared to cope with integers in the range 0..0x10FFFF,
> rather than just integers in the range 0..0xFFFF.

Since they are already assigned, I believe that Ken can very easily
create a 3.1-alpha release, with material identical to 3.0 except two
added lines, one with 0xF000 <First extended PU>, and a second
with 0x10FFFF <Last extended PU>. So certainly people could adjust
their parsers "live".
Prospective ;-)


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT