RE: Subject: Re: 32'nd bit & UTF-8

From: Peter Constable (petercon@microsoft.com)
Date: Wed Jan 19 2005 - 22:54:19 CST

Next message: Michael Everson: "Good news for Balinese"

Previous message: Peter Constable: "RE: Subject: Re: 32'nd bit & UTF-8"
Maybe in reply to: Arcane Jill: "Subject: Re: 32'nd bit & UTF-8"
Next in thread: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Reply: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
On
> Behalf Of Hans Aberg

> It is just that it is in effect a file encoding format, not a
character
> encoding format, originally tied to the MS OS. Unicode should not
promote
> any specific OS over another. Plain text files do not have a BOM,
period.

I've generally been deleting all this blather -- seems like every year
and a half or someone comes along raising a ruckus about UTF-8 -- so
perhaps this has been said; if so, please forgive the duplication.

The suggestion that Unicode is promoting a specific OS, specifically
Windows, based on statements in the standard related to UTF-8 is hard to
take seriously given that that OS does not itself use UTF-8 in its file
system, in its shell, nor by default in any of its internal operations
or APIs (some APIs, such as WideCharToMultiByte, can be coerced into
passing UTF-8).

As for whether plain text files can have a BOM, that is one of the few
unending debates that arise with certain (fortunately not too freguent)
regularity, each time with vociferous expressions of deeply-held beliefs
but never any resolution. I'll just observe that the formal grammar for
XML does not make reference to a BOM, yet the XML spec certainly assumes
that a well-formed XML document may begin with a UTF-8 BOM (or a BOM in
any Unicode encoding form/scheme). Rather than have a philosophical
debate about the definition of "plain text file", I suggest a more
pragmatic approach: for better or worse, plain text processes that
support UTF-8 are going to encounter UTF-8 data beginning with a BOM:
learn to live with it!

(Now I'll give advance notice: I'll probably resume deleting this thread
on first sight, do don't take it personally if I don't respond to a
reply.)

Peter Constable

Next message: Michael Everson: "Good news for Balinese"
Previous message: Peter Constable: "RE: Subject: Re: 32'nd bit & UTF-8"
Maybe in reply to: Arcane Jill: "Subject: Re: 32'nd bit & UTF-8"
Next in thread: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Reply: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jan 19 2005 - 22:55:01 CST