Re: HTML5 encodings

From: verdy_p (
Date: Mon Dec 28 2009 - 01:27:18 CST

  • Next message: Dominikus Scherkl: "Re: Filtering and displaying untrusted UTF-8"

    "Asmus Freytag" wrote:
    > > The reset mechanism doesn't seem to be mentioned in the BOCU patent.
    > Also, a reset that isn't enforced by protocol, but merely allowed,
    > doesn't improve the theoretical worst case. (While suffering from all
    > the problems you mentioned).

    The reset byte can be used for something more useful: it can be used as a key separator when sorting for example lists
    of multicolumn output with priority between columns, even if each column is sorted in binary codepoint order.
    The separator is actully not a character, but represents a metacharacter that will be higher than everything else, so
    it can effectively terminate all binary encoded strings (when they are differentà, and maintain their relative
    ordering; the following sort keys (further data columns) appended after it will not break the sort order of distinct
    level-1 keys, but you'll be able to binary sort on the second column when two rows have binary identical first

    It should never be used within the actual encoding of texts (and it is not even needed).

    This archive was generated by hypermail 2.1.5 : Mon Dec 28 2009 - 01:30:13 CST