From: Philippe Verdy (firstname.lastname@example.org)
Date: Thu Nov 20 2003 - 17:20:20 EST
From: "Michael (michka) Kaplan" <email@example.com>
> If you want to test gb18030 support, then please encode a web page in
> gb18030 and test *that* in the browser of your choice.
> Now if you want to discuss NCR support then that may also be interesting.
> But it would be nice to have tests that actually cover what they claim to
Aren't NCR's supposed to contain ONLY a Unicode code point, even on
Testing a page with NCR will only test Unicode support, not GB18030 support
even if the Unicode codepoint in the NCR indicates a character in the
ideographic plane 2...
To really test GB18030, you need to encode the page with it, without using
I.e. you need to know the mapping tables between GB18030 code positions and
Unicode code points, and implement the ranges table for those GB18030 code
positions that are algorithmically mapped on Unicode.
One subsidiary question.
What is a browser supposed to do if it finds an out-of-range GB sequence
that is NOT mapped to Unicode? Does GB18030 specify that these sequences are
now "invalid" (and permanently assigned to non-characters, like U+FFFF in
Unicode), and not "reserved" for future use (like "unassigned" code points
in Unicode) ?
This is critical, because I could fear that some future relase of GB18030
may assign some functions to these sequences, which will be impossible to
map onto Unicode, but only onto ISO/IEC-10646 "extra" planes. My worst fear
is that these sequences could be used to define EUDCS ideographic character,
using some extra convention that allows encoding glyph forms (or sequences
of strokes and layout info) and assign them to a PUA, directly within a
plain-text GB18030 document.
The alternative to it would be to create a model for grapheme clusters
adapted to Han ideographs, using ideographic description characters and
assigning code points to the composite Han strokes that make up the
ideograph. Then it would become possible to create a normative dictionnary
between all existing Han ideographs and their composed strokes (with an
additional benefit as it could allow implementing collation order by stroke
more easily, using the normative Han description decomposition). This would
also help unifying new collections of ideographs and avoid duplicate
assignments for those ideographs that merit a distinct encoding as a single
This archive was generated by hypermail 2.1.5 : Thu Nov 20 2003 - 18:10:41 EST