From: Peter Constable (petercon@microsoft.com)
Date: Wed Jan 19 2005 - 22:54:19 CST
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
On
> Behalf Of Hans Aberg
> It is just that it is in effect a file encoding format, not a
character
> encoding format, originally tied to the MS OS. Unicode should not
promote
> any specific OS over another. Plain text files do not have a BOM,
period.
I've generally been deleting all this blather -- seems like every year
and a half or someone comes along raising a ruckus about UTF-8 -- so
perhaps this has been said; if so, please forgive the duplication.
The suggestion that Unicode is promoting a specific OS, specifically
Windows, based on statements in the standard related to UTF-8 is hard to
take seriously given that that OS does not itself use UTF-8 in its file
system, in its shell, nor by default in any of its internal operations
or APIs (some APIs, such as WideCharToMultiByte, can be coerced into
passing UTF-8).
As for whether plain text files can have a BOM, that is one of the few
unending debates that arise with certain (fortunately not too freguent)
regularity, each time with vociferous expressions of deeply-held beliefs
but never any resolution. I'll just observe that the formal grammar for
XML does not make reference to a BOM, yet the XML spec certainly assumes
that a well-formed XML document may begin with a UTF-8 BOM (or a BOM in
any Unicode encoding form/scheme). Rather than have a philosophical
debate about the definition of "plain text file", I suggest a more
pragmatic approach: for better or worse, plain text processes that
support UTF-8 are going to encounter UTF-8 data beginning with a BOM:
learn to live with it!
(Now I'll give advance notice: I'll probably resume deleting this thread
on first sight, do don't take it personally if I don't respond to a
reply.)
Peter Constable
This archive was generated by hypermail 2.1.5 : Wed Jan 19 2005 - 22:55:01 CST