    I would like to add some information here without getting myself into the core of the discussion:

    HTML recognizes a lot fewer "whitespace" characters than Java or Unicode. Different people have
    different sets of "whitespace" characters.

    Unicode's White_Space property (PropList.txt) contains 24 code points (Unicode 3.2) but not U+FEFF.

    U+FEFF ZWNBSP is a format control (Cf), not any kind of space in the usual sense.

    U+FEFF, like all Cf, is a Default_Ignorable_Code_Point (DerivedCoreProperties.txt). (That is,
    sorting, searching, matching, etc. usually ignore it unless such code points are explicitly useful.)

    RFC 2279 *is* being updated, see
    Version -04 is supposed to be public shortly.


