Re: UTF-24

From: David Starner (
Date: Thu Apr 03 2003 - 15:01:13 EST

  • Next message: Michael Everson: "Re: ogonek vs. retroflex hook"

    On Thu, Apr 03, 2003 at 09:05:23PM +0200, Pim Blokland wrote:
    > Why is there no UTF-24?

    Why? UTF-24 will almost invariably be larger then UTF-16, unless you are
    talking a document in Old Italic or Gothic. The math alphanumberic
    characters will almost always be combined with enough ASCII to make
    UTF-8 a win, and if not, enough BMP characters to make UTF-16 a win.
    Modern computers don't deal with 24 bit chunks well; in memory, they'd
    take up 32 bits a piece, unless you declared them packed, and then
    they'd be a lot slower then UTF-16 or UTF-32. And if you're storing to
    disk, you may as well use BOCU or SCSU (you're already going
    non-standard), or use standard compression with UTF-8, UTF-16, BOCU or
    SCSU. SCSU or BOCU compressed should take up half the space of UTF-24,
    if that.

    David Starner -
    It's the terror of knowing/What this world is about
    Watching some good friends/Screaming 'Let me out'
       -- Queen, "Under Pressure"

    This archive was generated by hypermail 2.1.5 : Thu Apr 03 2003 - 15:39:11 EST