Re: HTML5 encodings

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Dec 28 2009 - 00:35:57 CST

  • Next message: verdy_p: "Re: HTML5 encodings"

    On 12/27/2009 8:09 PM, Doug Ewell wrote:
    > Asmus Freytag <asmusf at ix dot netcom dot com> wrote:
    >
    >> The second metric refers to encodings like ISO-2022 or SCSU which use
    >> control bytes or sequences switch among character sets. There are
    >> cases, where such as scheme could be set up to allow easy
    >> resynchronization in terms of character boundaries, yet still require
    >> that state information be maintained for very long (unbounded)
    >> stretches of data. Assume 2022 style combination of several single
    >> byte character sets. If that restriction is known (by announcement),
    >> then resynchronizing to any character boundary is trivial (as long as
    >> you recognize and avoid the escape codes). However, interpreting (or
    >> correctly converting) any given character is impossible without going
    >> back to the most recent character set switching escape code.
    >
    > BOCU-1 has a handy "reset" mechanism, in which the byte 0xFF doesn't
    > participate in the encoding of any character, but simply resets the
    > state of the encoder or decoder. If desired, these could be inserted
    > at certain intervals within a stream to ensure the availability of a
    > synchronization point, solving the problem above.
    >
    > However, such a mechanism naturally means a code point sequence could
    > be encoded in BOCU-1 in more than one way, and it could interfere with
    > the seemingly all-important binary-ordering property of BOCU-1, so the
    > authors apparently felt compelled to invoke the Principle of
    > Pre-Deprecation:
    >
    > "Using FF to reset the state breaks the ordering! The use of FF resets
    > is discouraged."
    >
    > The reset mechanism doesn't seem to be mentioned in the BOCU patent.
    Also, a reset that isn't enforced by protocol, but merely allowed,
    doesn't improve the theoretical worst case. (While suffering from all
    the problems you mentioned).

    A./



    This archive was generated by hypermail 2.1.5 : Mon Dec 28 2009 - 00:39:24 CST