Re: Unicode forms for internal storage - BOCU-1 speed

From: Doug Ewell (
Date: Fri Jan 23 2004 - 01:20:06 EST

  • Next message: Doug Ewell: "Re: Unicode forms for internal storage - BOCU-1 speed"

    Kenneth Whistler <kenw at sybase dot com> wrote:

    >> I have seen several other informal proposals for "UTF-*" forms/
    >> schemes. All this is just confusive, and their authors should imagine
    >> their own names for reference. What do you think of this idea?
    > It is, indeed, "confusive". Some of us have deliberately contributed
    > to the confusion with tongue-in-cheek additions. See my own
    > UTF-17 (draft-whistler-utf17-00.txt). I would not object if
    > henceforward people referred to that as KW-UTF-17, to avoid
    > confusion. :-)

    A couple of years ago I suggested calling these "XTF's," to distinguish
    them from the official "UTF's."

    I've added a bit to the confusivity, with ha-ha-only-serious schemes
    called DUCK (Doug's Unicode Compression Kludge) and MUCK (Multigraph
    Unicode Compression Kludge), plus something I called "dynamic code
    pages" which never saw the light of day, and probably never will because
    of their really, really bad performance.

    But mostly I've carried other people's jokes (and serious proposals) to
    the logical extreme and beyond, by creating fully functional and tested
    implementations of:

    • UTF-4 by Jill Ramonsky (name provided by John Cowan)
    • UTF-5 by James Seng, Martin Dürst, and Tin Wee Tan
    • UTF-7d5 by Jørg Knappen
    • UTF-8C1 by Markus Scherer
    • UTF-9 by Jerome Abela (not Mark Crispin's version)
    • UTF-17 by Ken
    • UTF-24 by Pim Blokland
    • UTF-64 by Marco and Paul Keinänen
    • UTF-mu by Marco
    • UTF-Z by Marco
    • XTF-3 by Shlomi Tal

    as well as some more serious formats:

    • UTF-1 (the "original" Unicode Transformation Format)
    • UTF-EBCDIC (described in meticulous detail in UTR #16)

    Currently I'm working on a much more useful project: a "clean-room"
    encoder and decoder for BOCU-1, possibly the world's first that doesn't
    just wrap the UTN #6 sample code.

    And on the lighter side, I recently dredged up Misha Wolf's original
    1995 description of RCSU, the predecessor of SCSU, and started
    experimenting with an encoder and decoder:

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Fri Jan 23 2004 - 02:04:40 EST