RE: creating a test font w/ CJKV Extension B characters.

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Fri Nov 21 2003 - 10:46:14 EST

  • Next message: Doug Ewell: "Re: creating a test font w/ CJKV Extension B characters."

    On Fri, 21 Nov 2003 15:12:26 +0100, "Philippe Verdy" wrote:
    >
    > Could an editor loading such incorrect but legacy GB-18030 file accept to
    > load it and work with it using an internal-only UCS-4 mapping (or an
    > extended UTF-8 mapping), to preserve those out of range sequences, as if
    > they were mapped in a extra PUA range?
    >

    An editor which stored data internally as extended UTF-32 or extended UTF-8
    could easily preserve such invalid codepoints, but BabelPad stores data
    internally as UTF-16 so it couldn't, and even if it could it wouldn't as its a
    Unicode editor, and codepoints beyond U+10FFFF are not Unicode (nor for that
    matter are codepoints beyond <E3 32 9A 35> valid GB-18030 as far as I'm aware).
    The first thing I'll do this evening is change BabelPad so that GB-18030
    codepoints beyond <E3 32 9A 35> are converted to U+FFFD.

    > Of course saving the file into a UTF encoding would be forbidden, but saving
    > the internal UCS-4 file back to GB-18030 would preserve those out-of-range
    > GB-18030 sequences, without making any other interpretation, and without
    > changing them arbitrarily into the GB18030 equivalent of U+FFFD?
    >
    > The editor could still use the Unicode rules for all valid GB18030
    > sequences. And the invalid characters could be then represented for example
    > with a colored/highlighted glyph such as <U+110000>. As both the input and
    > output are not a Unicode scheme, I don't think this invalidates the Unicode
    > conformance: the behavior would just be conforming to GB18030 or other
    > legacy GB PUAs mappings.
    >

    I'm pretty sure that there are no such legacy GB mapping, and I doubt that China
    will ever want to map characters to extra-Unicode codepoints in GB-18030 ...
    they seem far more interested in trying to force everyone else to accept their
    unwanted characters in the BMP than putting them in some limbo beyond Plane 16.

    Andrew



    This archive was generated by hypermail 2.1.5 : Fri Nov 21 2003 - 11:41:41 EST