RE: BOM in HTML

From: Lars Kristan (lars.kristan@hermes.si)
Date: Sat Jan 22 2005 - 11:30:12 CST

  • Next message: Jon Hanna: "RE: wchar_t (was RE: 32'nd bit & UTF-8)"

    Jon Hanna wrote:
    > HTML files are documents whose encoding is generally stated
    > out-of-band
    > (they are after all primarily used on the web).

    But this OOB data is lost when a file is saved to disk. Or are any
    applications already using some tagging in the OOB data of the filesystem?
    Really, I've never thought of it, how does IE handle this when saving files?

    > HTML files can contain <meta /> elements that MAY be used to determine
    > encoding in the absence of such out of band information, and all the

    One would expect that the priority is, low from high, OOB, BOM, meta, which
    is also the natural order of processing, meaning each directive would simply
    override previous ones. And not retroactively. Although, yes, a disagreement
    between BOM and meta could be problematic.

    Is there a deeper reason why the actual order is reversed and OOB overrides
    meta?

    > Encoding *is not* switched in the middle of a document.

    Could even be, theoretically. As long as each directive only applies for the
    data that follows it, possibly with the first one being an exception to the
    rule. But I guess there is no need for it.

    > However, the correct
    > encoding may not be known until some of the document has been already
    > processed, which may require some of it to be reprocessed.

    This being the reason for my mentioning an exception to the rule.

    And, thanks for the info, Jon.

    Lars



    This archive was generated by hypermail 2.1.5 : Sat Jan 22 2005 - 11:33:07 CST