Re: Proposing UTF-21/24

From: Frank Ellermann (nobody@xyzzy.claranet.de)
Date: Mon Jan 22 2007 - 11:58:01 CST

Next message: Asmus Freytag: "Re: Proposing a DOUBLE HYPHEN punctuation mark"

Previous message: Jon Hanna: "Re: Proposing a DOUBLE HYPHEN punctuation mark"
In reply to: Doug Ewell: "Re: Proposing UTF-21/24"
Next in thread: Doug Ewell: "Re: Proposing UTF-21/24"
Reply: Doug Ewell: "Re: Proposing UTF-21/24"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Doug Ewell wrote:

> The greatest roadblock to acceptance of SCSU is its *perception* of
> complexity. It is not nearly as complicated as it is perceived to be,
> and I say this having implemented both simple and optimized encoders as
> well as decoders. Algorithms like MD5 and Punycode and gzip are quite a
> bit more complex

Wait a moment, I've implemented MD5, UTF-1, UTF-7, and BOCU-1, and so far
I gave up on SCSU. To say that it's horrible would be putting it mildly.

One of the nice features of BOCU-1, a single error destroys at most one
line. With UTF-8 a single error destroys at most one code point. Try
that with SCSU, and its various ways to encode the same piece of text.

> BOCU-1 is less complex, but more obscure

Not at all, it's a rather smart application of the 3*7 bits idea discussed
in this thread, at some point it uses 1114111 = 2**20 + 2**16 -1 as biggest
possible "jump".

> an additional problem: its core algorithm is covered under a U.S.
> patent (6,737,994) owned by IBM. Although they currently offer a
> royalty-free license, IBM has been known to change their terms of
> licensing from time to time.

So far they didn't tell me that my BOCU-1 script needs a license - okay,
that's no serious objection. IMO nobody needs a special compression for
Unicode anyway. But in theory BOCU-1 is nice, especially if compared
with SCSU.

> memories of the Unisys GIF patent are still too fresh in my mind.

The LZW patent is expired worldwide now. It was possible to create
uncompressed GIFs, http://purl.net/xyzzy/pub/clear1x1.gif (45 bytes)
vs. clearlzw.gif (43 bytes) is an admittedly silly example.

Frank

Next message: Asmus Freytag: "Re: Proposing a DOUBLE HYPHEN punctuation mark"
Previous message: Jon Hanna: "Re: Proposing a DOUBLE HYPHEN punctuation mark"
In reply to: Doug Ewell: "Re: Proposing UTF-21/24"
Next in thread: Doug Ewell: "Re: Proposing UTF-21/24"
Reply: Doug Ewell: "Re: Proposing UTF-21/24"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jan 22 2007 - 12:07:05 CST