RE: [unicode] UTF-c

From: Doug Ewell (doug@ewellic.org)
Date: Mon Feb 21 2011 - 15:43:52 CST

  • Next message: Bjoern Hoehrmann: "Re: [unicode] UTF-c"

    Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

    > And anyway it is also much simpler to understand and easier to
    > implement correctly (not like the sample code given here) than SCSU,

    I don't buy this. A simple SCSU encoder, which achieves most of the
    benefits of a complex one, is nearly as simple as Cropley's algorithm.

    Both the complexity of SCSU, and the importance of the complexity of
    SCSU, continue to be highly overrated.

    Part of the apparent simplicity of Cropley's algorithm, as viewed from
    his "Preliminary Proposal" HTML page, is that it omits a proper
    description of the code-page switching mechanism, as well as the "magic
    number" definitions of the code pages and the control bytes needed to
    introduce them. These are present in the sample code, but to see them,
    you have to paw through the UTF-8 conversion code and UI.

    > and it is still very highly compressible with standard compression
    > algorithms while still allowing very fast processing in memory in its
    > decompressed encoded form :

    I see no metrics or sample data to back this up. How does Cropley's
    algorithm perform with mixed scripts (say Greek and Cyrillic), with
    embedded punctuation in the U+2000 block, with Deseret and other
    alphabets omitted from the Alphabet table, with larger alphabets where
    multiple 64-blocks are needed, with Han and Hangul?

    --
    Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
    RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­
    


    This archive was generated by hypermail 2.1.5 : Mon Feb 21 2011 - 15:48:03 CST