From: Doug Ewell (email@example.com)
Date: Fri Nov 21 2003 - 00:02:49 EST
Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
> What is a browser supposed to do if it finds an out-of-range GB
> sequence that is NOT mapped to Unicode? Does GB18030 specify that
> these sequences are now "invalid" (and permanently assigned to non-
> characters, like U+FFFF in Unicode), and not "reserved" for future use
> (like "unassigned" code points in Unicode) ?
An invalid GB18030 sequence, like <FE 40>, or a valid but out-of-range
sequence, like <E3 32 9A 36>, should be treated just like an invalid or
out-of-range UTF-8 sequence. Issue an error message, format the hard
disk, whatever; just don't try to treat it like a normal character.
> This is critical, because I could fear that some future relase of
> GB18030 may assign some functions to these sequences, which will be
> impossible to map onto Unicode, but only onto ISO/IEC-10646 "extra"
There ARE no "extra" planes in ISO/IEC 10646. They will not be used.
Ever. Forget you ever heard about them.
There are one hundred thirty-seven THOUSAND private-use code points. If
the Chinese insist on encoding characters in GB18030 that haven't been
approved by UTC and WG2, rest assured there will be plenty of room for
them in the PUA or EPUA.
This archive was generated by hypermail 2.1.5 : Fri Nov 21 2003 - 02:40:10 EST