From: Roozbeh Pournader (email@example.com)
Date: Sat Feb 15 2003 - 22:48:21 EST
On Sat, 15 Feb 2003, Michael (michka) Kaplan wrote:
> And to whatever extent UTF-8 has a BOM, it would fall under the same
> category. Certainly that is how processors that understand the UTF-8
> BOM deal with it.
Well, that needs researching into what UTF-8 is in W3C and HTML 4.0 terms:
What is a character set for interchange over the Internet? Section 6.9
"The "charset" attributes (%Charset in the DTD) refer to a character
encoding as described in the section on character encodings. Values
must be strings (e.g., "euc-jp") from the IANA registry (see
[CHARSETS] for a complete list)."
Specially note the "must" term above. The [CHARSETS] reference is:
Registered charset values. Download a list of registered charset
So it's time to go there. The URL above says:
"The Character Sets Registry has moved to the following:
OK, we'll go there instead and search for UTF-8. It says:
"Name: UTF-8 [RFC2279]
Source: RFC 2279
RFC 2279. A copy can be found at <http://www.ietf.org/rfc/rfc2279.txt>, or
any other place you like and search for FEFF, BOM, ZERO WIDTH NO-BREAK
SPACE, or the sequence "EF BB BF" there. Nothing can be found.
> Rather then treating HTML like the SQL standard (lofty goals that no
> one company completely supports because it would be insane to do it!)
> they can bend to the actual usage out there and just move on, right?
> How many browsers plan to refuse to show pages that do not follow HTML
> 4.0 rules? :-)
I agree, but the Unicode web age is the buggy thing here, not the specific
browser that was reported earlier to have a problem with it. That's all my
point. One should fix the Unicode web page instead of that browser.
I also personally belive that any browser should fix the small istakes
made by the author (or the authoring software) in some way or other, but
isn't it better for the author not to make the mistake, or fix it when one
finds about it?
This archive was generated by hypermail 2.1.5 : Sat Feb 15 2003 - 23:15:44 EST