Re: Invalid code points

From: Doug Ewell (
Date: Sun May 31 2009 - 20:50:09 CDT

  • Next message: David J. Perry: "Re: Old Italic in RTL ??"

    William J Poser <wjposer at ldc dot upenn dot edu> wrote:

    > If I understand Hans Aberg's point, he means that one can abstract the
    > mapping from the non-negative integers to byte sequences used by UTF-8
    > away from Unicode and use it for other purposes. One could, for
    > example, have a "UTF-8" encoding of the TRON indexed character set, or
    > of Nelson numbers. In this sense, there is "UTF-8", the integer->byte
    > sequence mapping, and UTF-8, the Unicode transformation format that
    > uses this mapping. This seems to me to be a perfectly valid point.
    > However, so as to avoid confusion, we ought to call them different
    > things, and since the "U" of "UTF-8" stands for "Unicode", it is the
    > mapping in the abstract that ought to be given another name, perhaps
    > the "Thompson mapping" or "diner encoding".

    Oh, absolutely. You can use the transformation for anything you like,
    and modify it to suit your needs. You can extend it to cover the
    original 31-bit range, and to encode the values 0xD800 through 0xDFFF.
    You can even explain that it is derived from UTF-8.

    What you must not do, though, is call the resulting transformation
    "UTF-8," or anything that people will have a reasonable chance of
    confusing with the real UTF-8, such as "UTF-8X."

    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14  ˆ

    This archive was generated by hypermail 2.1.5 : Sun May 31 2009 - 20:51:35 CDT