    Markus Scherer scripsit:

    > UTF-8 is useful because it's simple, and supported just about everywhere -
    > but it's otherwise hardly optimal for anything.

    You entirely omit its principal advantage, sine qua non: it's maximally
    ASCII-compatible, using bytes 0x00 to 0x7F to represent ASCII characters and
    nothing else.

    Mark Crispin's UTF-9 (not to be confused with Jerome Abela's) is also
    excellent, although most of us don't have 36-bit systems, for which it
    makes sense. A precis:

    Code points (base 2) UTF-9 code units (base 2)
    0000000000000abcdefgh 0abcdefgh
    00000abcdefghijklmnop 1abcdefgh 0ijklmnop
    abcdefghijklmnopqrstu 1000abcde 1fghijklm 0nopqrstu

    This is almost as good as Latin-1 for its repertoire, only minutely worse
    than UTF-16 for the rest of the BMP, and beats all other encodings for the
    other planes.

