Date: Sun May 31 2009

    > There is only one UTF-8, the one defined by Unicode and ISO/IEC 10646,
    >which maps valid Unicode/10646 scalar values to sequences of bytes.
    >Anything else is not UTF-8. Keep repeating this to yourself.

    If I understand Hans Aberg's point, he means that one can abstract
    the mapping from the non-negative integers to byte sequences used by
    UTF-8 away from Unicode and use it for other purposes. One could,
    for example, have a "UTF-8" encoding of the TRON indexed character
    set, or of Nelson numbers. In this sense, there is "UTF-8", the
    integer->byte sequence mapping, and UTF-8, the Unicode transformation
    format that uses this mapping. This seems to me to be a perfectly valid point.
    However, so as to avoid confusion, we ought to call them different
    things, and since the "U" of "UTF-8" stands for "Unicode", it is the
    mapping in the abstract that ought to be given another name, perhaps
    the "Thompson mapping" or "diner encoding".


