Re: Nicest UTF

From: Arcane Jill (
Date: Mon Dec 06 2004 - 02:39:24 CST

  • Next message: Peter R. Mueller-Roemer: "OpenType not for Open Communication?"

    Probably a dumb question, but how come nobody's invented "UTF-24" yet? I
    just made that up, it's not an official standard, but one could easily
    define UTF-24 as UTF-32 with the most-significant byte (which is always
    zero) removed, hence all characters are stored in exactly three bytes and
    all are treated equally. You could have UTF-24LE and UTF-24BE variants, and
    even UTF-24 BOMs. Of course, I'm not suggesting this is a particularly
    brilliant idea, but I just wonder why no-one's suggested it before.

    (And then of course, there's UTF-21, in which blocks of 21 bits are
    concatenated, so that eight Unicode characters will be stored in every 21
    bytes - and not to mention UTF-20.087462841250343, in which a plain text
    document is simply regarded as one very large integer expressed in radix
    1114112, and whose UTF-20.087462841250343 representation is simply that
    number expressed in binary. But now I'm getting /very/ silly - please don't
    take any of this seriously.) :-)

    The "UTF-24" thing seems a reasonably sensible question though. Is it just
    that we don't like it because some processors have alignment restrictions or

    Arcane Jill

    -----Original Message-----
    From: []On
    Behalf Of Marcin 'Qrczak' Kowalczyk
    Sent: 02 December 2004 16:59
    Subject: Re: Nicest UTF

    "Arcane Jill" <> writes:
    > Oh for a chip with 21-bit wide registers!
    Not 21-bit but 20.087462841250343-bit :-)

    __("< Marcin Kowalczyk

    This archive was generated by hypermail 2.1.5 : Mon Dec 06 2004 - 02:43:14 CST