BOM's at Beginning of Web Pages?

From: Tom Gewecke (tom@bluesky.org)
Date: Sat Feb 15 2003 - 11:59:47 EST

  • Next message: Stefan Persson: "Re: Ångstrøm symbol"

    Michael Everson recently pointed out that the Unicode home page seems to
    begin with the character U+FEFF (ZWNBS/BOM), encoded as UTF-8. Presumably
    this is an artifact created by the program used to make the page, although
    I haven't noticed it on any others on the site.

    I had a look at the BOM faq and am wondering if any list members could
    confirm my understanding of the proper use of BOM at the start of web pages:

    --The only case where a BOM should be used is when the byte order is not
    specified by the encoding/charset listed in the HTML, i.e. UTF-16 or 32.
    For all others, including the BE and LE varieties of the latter, it should
    not be used.

    --If the page is marked UTF-16 and has no BOM it will be interpreted as
    UTF-16BE.

    --U+FEFF can appear (presumably by accident) at the beginning of any web
    page, but aside from those two cases where it is necessary, it is a ZWNBS
    and not a BOM. (As Michael pointed out, Mac IE 5.2.2 displays a Euro
    symbol).

    Suppose a page has no charset/encoding specified in the markup. Does the
    presence of U+FEFF mean it should be presumed to be UTF-16? Some of my
    browsers behave this way.



    This archive was generated by hypermail 2.1.5 : Sat Feb 15 2003 - 12:38:36 EST