Re: (Informational only: UTF-8 BOM and the real life)

From: Leif H Silli <xn--mlform-iua_at_xn--mlform-iua.no>
Date: Sat, 28 Jul 2012 19:00:15 +0300

Steven replied:

>>In XML 1.0 the BOM is in fact described as a signature regardless of
>> which unicode encoding it is used with:
>>
>> |http://www.w3.org/TR/xml/#charencoding
>
> Yes, simply spoken out and clarified like that, and everybody
> knows what to deal with.
>
> And btw., my local copy of XML 1.1 (Second Edition, thus current)
> doesn't include this paragraph (in the referenced 4.3.3):
>
> |If the replacement text of an external entity is to begin with
> |the character U+FEFF, and no text declaration is present, then
> |a Byte Order Mark MUST be present, whether the entity is encoded
> |in UTF-8 or UTF-16.

I think you must reread. I find the same "signature" sentence in XML 1.1:

http://www.w3.org/TR/xml11/#charencoding
 
> But i don't see the big picture of all that markup standards, i'm
> just have them in case my own work raises some questions..

We now have some data that indicates that what Unicode says about the UTF-8
BOM is worded in a way that is possible to misunderstand. I support you in
that Unicode should be more explicit about the fact that

* it is neutral about the BOM in UTF-8 (currently it is possible to read it
as if Unicode advices against the BOM)

* The BOM is a encoding signature - for both UTF-8 and UTF-16.

--
leif halvard silli 
Received on Sat Jul 28 2012 - 11:03:50 CDT

This archive was generated by hypermail 2.2.0 : Sat Jul 28 2012 - 11:03:51 CDT