Re: ASCII and Unicode lifespan

From: Dean Snyder (dean.snyder@jhu.edu)
Date: Thu May 19 2005 - 01:08:46 CDT

  • Next message: John H. Jenkins: "Re: ASCII and Unicode lifespan"

    Doug Ewell wrote at 10:15 PM on Tuesday, May 17, 2005:

    >Now, in keeping with this, what problems does Unicode present that will
    >lead to its replacement by something better?

    Here, off the top of my head, are some problems with Unicode which,
    cumulatively, could prove its undoing:

      Needless complexity
      Stateful mechanisms
      No support for a clean division between text and meta-text
      Errors in actual content
      Legacy sludge
      Irreversibility

    >How will the "something better" solve these problems without
    >introducing new ones?

    Subsequent encoding efforts will be better because they will have
    learned from the mistakes of earlier encoders ;-)

    Probably the single most important, and extremely simple, step to a
    better encoding would be to force all encoded characters to be 4 bytes.

    >How will it meet the challenge of transcoding untold amounts
    >of "legacy" Unicode data?

    Transcoding Unicode data into some new standard could at least be done
    in ways similar to the ways pre-Unicode data is being transcoded into
    Unicode now - an almost trivial pursuit.

    >How will it respond to the inevitable objections from supporters
    >of other encoding systems as Unicode has done?

    Hopefully:
      With no arrogance.
      With broader cooperation.
      With greater deliberation and less haste.
      With more accumulated intelligence.
      With better architectural design.

    Don't get me wrong. I think ISO 10646/Unicode is, for the most part, a
    wonderful pioneering effort to digitize the world's scripts. And there
    is no doubt that all future encoders will make mistakes too. But I do
    believe that hubris, intolerable in such matters, has unfortunately led
    to short-sighted mistakes in both the architecture and content of
    Unicode, mistakes Unicode is saddled with in perpetuity.

    As just one example of the kind of architectural change that could drive
    new encoding schemes, one could propose an encoding design that self-
    references its own mutability, thereby redefining "stability" to include
    not only extensibility but also reversibility. This would be
    accomplished by dedicating as version indicators, e.g., 7 of the 32 bits
    in every 4 byte character.

    Dean A. Snyder

    Assistant Research Scholar
    Manager, Digital Hammurabi Project
    Computer Science Department
    Whiting School of Engineering
    218C New Engineering Building
    3400 North Charles Street
    Johns Hopkins University
    Baltimore, Maryland, USA 21218

    office: 410 516-6850
    cell: 717 817-4897
    www.jhu.edu/digitalhammurabi/
    http://users.adelphia.net/~deansnyder/



    This archive was generated by hypermail 2.1.5 : Thu May 19 2005 - 10:14:44 CDT