UTF-24

From: Pim Blokland (pblokland@planet.nl)
Date: Thu Apr 03 2003 - 14:05:23 EST

  • Next message: jameskass@att.net: "Re: ogonek vs. retroflex hook"

    All this talk about these higher-plane characters - you know, plane
    1 and above; let's call them MathText characters for short - has got
    me wondering.

    Why is there no UTF-24?

    See, these MathText characters take up a lot of space. No matter how
    you encode them; UTF-8, UTF-16 or UTF-32; they always are 4 bytes
    long. Now if we had UTF-24, they would only take up 3 bytes.
    And since the Unicode character range is formally defined to run no
    higher than U+10FFFD, which fits in 3 bytes, I see no reason why
    no-one has ever gone to the trouble of defining a 3-byte storage
    method.
    Implementation would be easy; there would be only two variants,
    UTF-24LE and UTF-24BE, and that's it. No juggling with bits like in
    UTF-8 and UTF-16 or anything complicated like that. Just the plain
    character values, just like in UTF-32, only with 75% of the storage
    needed.

    Comments anyone?

    Pim Blokland



    This archive was generated by hypermail 2.1.5 : Thu Apr 03 2003 - 14:43:44 EST