Re: HTML5 encodings

From: Doug Ewell (
Date: Mon Dec 28 2009 - 22:19:57 CST

  • Next message: Doug Ewell: "Re: HTML5 encodings (was: Re: BOCU patent)"

    "verdy_p" <verdy underscore p at wanadoo dot fr> wrote:

    > The [BOCU-1] reset byte can be used for something more useful: it can
    > be used as a key separator when sorting for example lists of
    > multicolumn output with priority between columns, even if each column
    > is sorted in binary codepoint order. The separator is actully not a
    > character, but represents a metacharacter that will be higher than
    > everything else, so it can effectively terminate all binary encoded
    > strings (when they are differentà, and maintain their relative
    > ordering; the following sort keys (further data columns) appended
    > after it will not break the sort order of distinct level-1 keys, but
    > you'll be able to binary sort on the second column when two rows have
    > binary identical first columns...

    Unicode, and even ASCII, contains plenty of seldom-used control
    characters, with defined semantics if that is desirable, which an
    internal process can safely insert, use, and remove for purposes like
    this. There's no need to overload an internal characteristic of an
    encoding to accomplish this, especially since it ties your data to a
    particular encoding.

    Someone tried to do something like this with UTF-8 a lot of years ago,
    and to make a long story short, that's how we got the tag characters.

    Doug Ewell  |  Thornton, Colorado, USA  |
    RFC 5645, 4645, UTN #14  |  ietf-languages @ ­

    This archive was generated by hypermail 2.1.5 : Mon Dec 28 2009 - 22:21:53 CST