Re: Roundtripping Solved

From: Doug Ewell (
Date: Wed Dec 15 2004 - 10:27:48 CST

  • Next message: Marcin 'Qrczak' Kowalczyk: "Re: Roundtripping in Unicode"

    Arcane Jill <arcanejill at ramonsky dot com> wrote:

    > DEFINITION - "f" is a function which maps an arbitrary octet stream to
    > a sequence of Unicode characters, such that (1) any substring which
    > happens to be valid UTF-8 is mapped to the sequence of Unicode
    > characters which would have been produced by UTF-8, and (2) all
    > remaining single octets, xx (with x necessarily such that 0x80 <= xx
    > <= 0xFF) are each mapped to the sequence: { U+0C55E3, U+01ED7A,
    > U+05FDCB, U+09C351, U+07E168, U+0BBC80, U+107C09, U+0BA458, U+064188,
    > U+048375, U+08ACE0, U+031DEF, U+00xx } (I got those numbers from a
    > true random number generator).

    Reminds me of Masahiko Maedera's "UTF-16X" proposal, which used triples
    of code points in the block U+EExxx to represent values above 0x110000,
    under the (false) assumption that such a thing was needed.

    Of course, Jill's scheme uses non-private-use Unicode scalar values to
    achieve what is essentially a private-use function, so this is still
    non-conformant. (A similar scheme that only used code points from the
    Plane 0, Plane 15, and Plane 16 PUAs would be fine.) But I gather that
    Lars isn't too worried about being non-conformant, or we wouldn't be
    having this thread.

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 10:34:49 CST