RE: [unicode] UTF-c

From: Doug Ewell (doug@ewellic.org)
Date: Mon Feb 21 2011 - 15:43:52 CST

Next message: Bjoern Hoehrmann: "Re: [unicode] UTF-c"

Previous message: Philippe Verdy: "Re: [unicode] UTF-c"
Next in thread: Philippe Verdy: "Re: [unicode] UTF-c"
Reply: Philippe Verdy: "Re: [unicode] UTF-c"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

> And anyway it is also much simpler to understand and easier to
> implement correctly (not like the sample code given here) than SCSU,

I don't buy this. A simple SCSU encoder, which achieves most of the
benefits of a complex one, is nearly as simple as Cropley's algorithm.

Both the complexity of SCSU, and the importance of the complexity of
SCSU, continue to be highly overrated.

Part of the apparent simplicity of Cropley's algorithm, as viewed from
his "Preliminary Proposal" HTML page, is that it omits a proper
description of the code-page switching mechanism, as well as the "magic
number" definitions of the code pages and the control bytes needed to
introduce them. These are present in the sample code, but to see them,
you have to paw through the UTF-8 conversion code and UI.

> and it is still very highly compressible with standard compression
> algorithms while still allowing very fast processing in memory in its
> decompressed encoded form :

I see no metrics or sample data to back this up. How does Cropley's
algorithm perform with mixed scripts (say Greek and Cyrillic), with
embedded punctuation in the U+2000 block, with Deseret and other
alphabets omitted from the Alphabet table, with larger alphabets where
multiple 64-blocks are needed, with Han and Hangul?

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s

Next message: Bjoern Hoehrmann: "Re: [unicode] UTF-c"
Previous message: Philippe Verdy: "Re: [unicode] UTF-c"
Next in thread: Philippe Verdy: "Re: [unicode] UTF-c"
Reply: Philippe Verdy: "Re: [unicode] UTF-c"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Feb 21 2011 - 15:48:03 CST