RE: creating a test font w/ CJKV Extension B characters.

From: Andrew C. West (
Date: Fri Nov 21 2003 - 10:46:14 EST

    On Fri, 21 Nov 2003 15:12:26 +0100, "Philippe Verdy" wrote:
    > Could an editor loading such incorrect but legacy GB-18030 file accept to
    > load it and work with it using an internal-only UCS-4 mapping (or an
    > extended UTF-8 mapping), to preserve those out of range sequences, as if
    > they were mapped in a extra PUA range?

    An editor which stored data internally as extended UTF-32 or extended UTF-8
    could easily preserve such invalid codepoints, but BabelPad stores data
    internally as UTF-16 so it couldn't, and even if it could it wouldn't as its a
    Unicode editor, and codepoints beyond U+10FFFF are not Unicode (nor for that
    matter are codepoints beyond <E3 32 9A 35> valid GB-18030 as far as I'm aware).
    The first thing I'll do this evening is change BabelPad so that GB-18030
    codepoints beyond <E3 32 9A 35> are converted to U+FFFD.

    > Of course saving the file into a UTF encoding would be forbidden, but saving
    > the internal UCS-4 file back to GB-18030 would preserve those out-of-range
    > GB-18030 sequences, without making any other interpretation, and without
    > changing them arbitrarily into the GB18030 equivalent of U+FFFD?
    > The editor could still use the Unicode rules for all valid GB18030
    > sequences. And the invalid characters could be then represented for example
    > with a colored/highlighted glyph such as <U+110000>. As both the input and
    > output are not a Unicode scheme, I don't think this invalidates the Unicode
    > conformance: the behavior would just be conforming to GB18030 or other
    > legacy GB PUAs mappings.

    I'm pretty sure that there are no such legacy GB mapping, and I doubt that China
    will ever want to map characters to extra-Unicode codepoints in GB-18030 ...
    they seem far more interested in trying to force everyone else to accept their
    unwanted characters in the BMP than putting them in some limbo beyond Plane 16.


