Re: Problem with SSI and BOM

From: Mark Davis (
Date: Mon Sep 25 2006 - 08:12:37 CST

  • Next message: Cristian Secară: "what is the Unicode correspondent of character HORIZONTAL BAR from ISO/IEC 6397 ?"

    On 9/24/06, Jukka K. Korpela <> wrote:
    > On Sun, 24 Sep 2006, Doug Ewell wrote:
    > > A process that claims to be able to "support Unicode"
    > > should at least be able to follow the simple rule, "If the file or
    > stream
    > > starts with EF BB BF, throw them away and treat the remainder of the
    > file or
    > > stream as UTF-8."
    > No, that would be incorrect if the character encoding of the data has been
    > declared. It would be a mistake to start interpreting the octets of data
    > in a manner othen than the declared encoding, at least as long as the data
    > is formally correct according to the encoding.

    In theory, that's correct. In practice, however, the charset is set
    incorrectly so, so often. In a browser, the user can reset the charset
    manually if he or she sees that it is wrong. That option is not available to
    more mechanical processes like search engines -- there, the process simply
    can't afford to always believe the charset parameter(s), any more than it
    can always depend on the HTML being valid.


    This archive was generated by hypermail 2.1.5 : Mon Sep 25 2006 - 08:19:34 CST