Re: BOM's at Beginning of Web Pages?

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Mon Feb 17 2003 - 17:44:00 EST

  • Next message: Michael \(michka\) Kaplan: "Re: DBCS and Unicode 3.1"

    I would like to add some information here without getting myself into the core of the discussion:

    HTML recognizes a lot fewer "whitespace" characters than Java or Unicode. Different people have
    different sets of "whitespace" characters.

    Unicode's White_Space property (PropList.txt) contains 24 code points (Unicode 3.2) but not U+FEFF.

    U+FEFF ZWNBSP is a format control (Cf), not any kind of space in the usual sense.

    U+FEFF, like all Cf, is a Default_Ignorable_Code_Point (DerivedCoreProperties.txt). (That is,
    sorting, searching, matching, etc. usually ignore it unless such code points are explicitly useful.)

    RFC 2279 *is* being updated, see http://www.ietf.org/internet-drafts/draft-yergeau-rfc2279bis-03.txt
    Version -04 is supposed to be public shortly.

    markus

    -- 
    Opinions expressed here may not reflect my company's positions unless otherwise noted.
    


    This archive was generated by hypermail 2.1.5 : Mon Feb 17 2003 - 18:30:42 EST