Re: (Informational only: UTF-8 BOM and the real life)

From: Steven Atreju <snatreju_at_googlemail.com>
Date: Mon, 30 Jul 2012 12:52:13 +0200

"Leif H Silli" <xn--mlform-iua_at_xn--mlform-iua.no> wrote:

 |We now have some data that indicates that what Unicode says about the UTF-8
 |BOM is worded in a way that is possible to misunderstand. I support you in

Yeah! Yeah! Yeah!, that is good to read black on #FCFCF9.

 |Steven replied:
 |
 |>>In XML 1.0 the BOM is in fact described as a signature regardless of
 |>> which unicode encoding it is used with:
 |>>
 |>> |http://www.w3.org/TR/xml/#charencoding
 |>
 |> Yes, simply spoken out and clarified like that, and everybody
 |> knows what to deal with.
 |>
 |> And btw., my local copy of XML 1.1 (Second Edition, thus current)
 |> doesn't include this paragraph (in the referenced 4.3.3):
 |>
 |> |If the replacement text of an external entity is to begin with
 |> |the character U+FEFF, and no text declaration is present, then
 |> |a Byte Order Mark MUST be present, whether the entity is encoded
 |> |in UTF-8 or UTF-16.
 |
 |I think you must reread. I find the same "signature" sentence in XML 1.1:
 |
 |http://www.w3.org/TR/xml11/#charencoding
 |
 |> But i don't see the big picture of all that markup standards, i'm
 |> just have them in case my own work raises some questions..
 |
 |We now have some data that indicates that what Unicode says about the UTF-8
 |BOM is worded in a way that is possible to misunderstand. I support you in
 |that Unicode should be more explicit about the fact that
 |
 |* it is neutral about the BOM in UTF-8 (currently it is possible to read it
 |as if Unicode advices against the BOM)
 |
 |* The BOM is a encoding signature - for both UTF-8 and UTF-16.
 |--
 |leif halvard silli
Received on Mon Jul 30 2012 - 06:00:42 CDT

This archive was generated by hypermail 2.2.0 : Mon Jul 30 2012 - 06:01:12 CDT