Re: Unicode forms for internal storage - BOCU-1 speed

From: jcowan@reutershealth.com
Date: Thu Jan 22 2004 - 13:50:47 EST

Next message: Mark Davis: "Re: Unicode forms for internal storage - BOCU-1 speed"

Previous message: Markus Scherer: "Re: problem - non-ASCII characters on Windows command line"
In reply to: Markus Scherer: "Re: Unicode forms for internal storage - BOCU-1 speed"
Next in thread: Mark Davis: "Re: Unicode forms for internal storage - BOCU-1 speed"
Reply: Mark Davis: "Re: Unicode forms for internal storage - BOCU-1 speed"
Reply: Philippe Verdy: "Re: Unicode forms for internal storage - BOCU-1 speed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Markus Scherer scripsit:

> UTF-8 is useful because it's simple, and supported just about everywhere -
> but it's otherwise hardly optimal for anything.

You entirely omit its principal advantage, sine qua non: it's maximally
ASCII-compatible, using bytes 0x00 to 0x7F to represent ASCII characters and
nothing else.

Mark Crispin's UTF-9 (not to be confused with Jerome Abela's) is also
excellent, although most of us don't have 36-bit systems, for which it
makes sense. A precis:

Code points (base 2) UTF-9 code units (base 2)
0000000000000abcdefgh 0abcdefgh
00000abcdefghijklmnop 1abcdefgh 0ijklmnop
abcdefghijklmnopqrstu 1000abcde 1fghijklm 0nopqrstu

This is almost as good as Latin-1 for its repertoire, only minutely worse
than UTF-16 for the rest of the BMP, and beats all other encodings for the
other planes.

-- 
John Cowan                              <jcowan@reutershealth.com>
http://www.ccil.org/~cowan              http://www.reutershealth.com
                Charles li reis, nostre emperesdre magnes,
                Set anz totz pleinz ad ested in Espagnes.

Next message: Mark Davis: "Re: Unicode forms for internal storage - BOCU-1 speed"
Previous message: Markus Scherer: "Re: problem - non-ASCII characters on Windows command line"
In reply to: Markus Scherer: "Re: Unicode forms for internal storage - BOCU-1 speed"
Next in thread: Mark Davis: "Re: Unicode forms for internal storage - BOCU-1 speed"
Reply: Mark Davis: "Re: Unicode forms for internal storage - BOCU-1 speed"
Reply: Philippe Verdy: "Re: Unicode forms for internal storage - BOCU-1 speed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 22 2004 - 14:50:05 EST