Re: Nicest UTF

From: Doug Ewell (dewell@adelphia.net)
Date: Mon Dec 06 2004 - 10:22:13 CST

Next message: Peter Kirk: "Re: No Invisible Character - NBSP at the start of a word"

Previous message: Dean Snyder: "Re: No Invisible Character - NBSP at the start of a word"
In reply to: Arcane Jill: "Re: Nicest UTF"
Next in thread: Philippe Verdy: "Re: Nicest UTF.. UTF-9, UTF-36, UTF-80, UTF-64, ..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Arcane Jill <arcanejill at ramonsky dot com> wrote:

> Probably a dumb question, but how come nobody's invented "UTF-24" yet?
> I just made that up, it's not an official standard, but one could
> easily define UTF-24 as UTF-32 with the most-significant byte (which
> is always zero) removed, hence all characters are stored in exactly
> three bytes and all are treated equally. You could have UTF-24LE and
> UTF-24BE variants, and even UTF-24 BOMs. Of course, I'm not suggesting
> this is a particularly brilliant idea, but I just wonder why no-one's
> suggested it before.

It has been suggested before, by Pim Blokland on April 3, 2003, in a
message titled "UTF-24." If you get the digest, it's in Digest V3 #79.

> The "UTF-24" thing seems a reasonably sensible question though. Is it
> just that we don't like it because some processors have alignment
> restrictions or something?

Almost all do. In addition, no programming language I know of has a
3-byte-wide integer data type (maybe INTERCAL does), so the efficiency
of UTF-24 would be wasted in software as well as in hardware.

Besides that, there were the usual protests that supplementary
characters would be vanishingly rare in the context of "normal" text,
and that one should use compression (SCSU/BOCU or GP tools) if size is
an issue.

None of this stopped me from experimentally implementing it, of course,
but I haven't touched it since finishing the implementation.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: Peter Kirk: "Re: No Invisible Character - NBSP at the start of a word"
Previous message: Dean Snyder: "Re: No Invisible Character - NBSP at the start of a word"
In reply to: Arcane Jill: "Re: Nicest UTF"
Next in thread: Philippe Verdy: "Re: Nicest UTF.. UTF-9, UTF-36, UTF-80, UTF-64, ..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Dec 06 2004 - 10:26:04 CST