Re: Timetables and conventions

From: Doug Ewell (dewell@compuserve.com)
Date: Sat Jun 17 2000 - 02:33:34 EDT


Kenneth Whistler <kenw@sybase.com> wrote:

> Actually, it is currently under discussion for the Unicode 3.0.1
> update version release, which is imminent. (No new characters, just
> some minor fixes for some data files, etc.)
>
> UnicodeData.txt, which currently contains entries like:
>
> E000;<Private Use, First>;Co;0;L;;;;;N;;;;;
> F8FF;<Private Use, Last>;Co;0;L;;;;;N;;;;;
>
> May be extended to contain corresponding entries:
>
> F0000;<Plane 15 Private Use, First>;Co;0;L;;;;;N;;;;;
> FFFFD;<Plane 15 Private Use, Last>;Co;0;L;;;;;N;;;;;
> 100000;<Plane 16 Private Use, First>;Co;0;L;;;;;N;;;;;
> 10FFFD;<Plane 16 Private Use, Last>;Co;0;L;;;;;N;;;;;
>
> which would give everybody an easy test to see if their parsers
> croak!

I would strongly support the addition of this feature to the 3.0.1
UnicodeData.txt file. It would be an excellent idea, not only for the
reason Ken gave, but also to enshrine in UnicodeData.txt the fact that
(a) "characters" whose scalar values mod 0x10000 equal 0xFFFE or 0xFFFF
are not true characters in Unicode, and (b) Unicode ends at Plane 16, so
the last character is U+10FFFD, not U+10FFFF (using the new notation).

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT