Re: Proposing UTF-21/24

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jan 22 2007 - 14:36:45 CST

  • Next message: Asmus Freytag: "Re: Proposing UTF-21/24"

    Frank Ellermann objected:

    > Doug Ewell wrote:
    >
    > > The greatest roadblock to acceptance of SCSU is its *perception* of
    > > complexity. It is not nearly as complicated as it is perceived to be,
    > > and I say this having implemented both simple and optimized encoders as
    > > well as decoders. Algorithms like MD5 and Punycode and gzip are quite a
    > > bit more complex
    >
    > Wait a moment, I've implemented MD5, UTF-1, UTF-7, and BOCU-1, and so far
    > I gave up on SCSU. To say that it's horrible would be putting it mildly.

    I've got to side with Doug here. As he pointed out, the decoder
    for SCSU is trivial. And in UTN #14 Doug has written the pseudocode
    for an encoder in one page. I implemented the precursor of
    SCSU years ago, and while optimizing the encoder can be tricky,
    there really isn't all that much to it. (The main difference
    between the precursor to SCSU and SCSU itself is that SCSU
    defines a bunch of static windows, whereas the precursor just
    did everything by calculating one dynamic window.)

    There really isn't any reason, with SCSU, to get bogged down
    in trying to get some kind of theoretical best behavior out
    of the encoder. For all reasonable purposes, good enough is
    good enough. Rather than trying to tweak the optimization of
    an SCSU encoder to gain another 1% in special cases, it makes
    much more sense to simply choose another general compression
    mechanism instead, for those special cases.

    --Ken

    P.S. As for main the topic of this thread, I have to agree with Doug,
    Mark, and David Starner. There is nothing compelling enough
    about the design of UTF-21/24 that would give it any advantage,
    either for storage or for processing, over the existing
    UTF-8 and UTF-16.



    This archive was generated by hypermail 2.1.5 : Mon Jan 22 2007 - 14:38:52 CST