Stateful encoding mechanisms

From: Dean Snyder (
Date: Wed May 18 2005 - 23:15:33 CDT

  • Next message: Dean Snyder: "Re: ASCII and Unicode lifespan"

    Alexander Kh. wrote at 7:24 PM on Wednesday, May 18, 2005:

    >That's Microsoft scale gigantism. I can think of many ways to restrict
    >use of Unicode to only non-critical cases where the accuracy of data is
    >of no importance. For example: by using a modified UTF-8 format where
    >a ASCII letter can be used as a switch selector between any local
    >encodings - that method will allow to save A LOT of space for commonly
    >used characters.
    >I think that by biulding extentions to UTF-8, such as a state-machine
    >system, and using small but well-thought encoding tables and fonts one
    >can totally avoid using Unicode, which is sloppy, inaccurate, incomplete
    >and for some strange reason uses character '\0' within a string. This is
    >not to mention its endianness problem. ...

    Stateful mechanisms for plain text encoding are bad if for no other
    reason than fragment fragility. Unfortunately Unicode does contain some
    state-machine characters, which I think are mistakes - enabling, as they
    do, fragment ambiguity or non-interpretability.

    Here are some:

    Stateful mechanisms that contribute to fragility at the character level -

    Stateful mechanisms that contribute to fragility above the character level -
      Bidirectional Ordering Controls
      Annotation characters

    Are there other stateful mechanisms in Unicode?


    Dean A. Snyder

    Assistant Research Scholar
    Manager, Digital Hammurabi Project
    Computer Science Department
    Whiting School of Engineering
    218C New Engineering Building
    3400 North Charles Street
    Johns Hopkins University
    Baltimore, Maryland, USA 21218

    office: 410 516-6850
    cell: 717 817-4897

    This archive was generated by hypermail 2.1.5 : Thu May 19 2005 - 10:12:52 CDT