From: Hans Aberg (firstname.lastname@example.org)
Date: Wed Jan 19 2005 - 17:51:29 CST
On 2005/01/19 21:37, Peter Kirk at email@example.com wrote:
>>> Maybe. Nevertheless, they exist, not only as a result of unintelligent
>>> conversion from UTF-16 or UTF-32 to UTF-8, but also because at least one
>>> UTF-8 editor, Notepad on Windows 2000 (and XP?), always emits a BOM at
>>> the start of a UTF-8 file.
>> Well, it seems easier to change that single editor, then. ...
> It's not easy to change a program with an installed base in the hundreds
> of millions worldwide! But I suppose it could be done as part of a
> Windows service pack etc.
It would be strange if one MS couldn't provide an upgrade for such a small
software change, especially since one updates all other software.
> But that assumes that everyone would agree that this change would be a
> good idea. Oliver doesn't, and he makes a good point.
Well, isn't that a problem for MS then? BOM's screw up the UNIX platforms,
so it is not going to honored there anyway.
>> ... Or write a program
>> that removes it at need. Note however that most tools will just act on byte
>> streams. If there is a generated lexer involved, if correctly written, it
>> will generate an error for anything that is not correct. On the BOM
>> question, some fellows simply wants the BOM's to be ignored.
> I thought everyone was required to ignore BOM's, as soon as the encoding
> has been determined.
The problem is that UNIX software looks at the first bytes to determine if
it is a shell script. This relies on the special property of the original
UTF-8 that it is the identity on ASCII data. By requiring a BOM, it is no
has this ASCII compatibility property. And lexers that are made for ASCII
data will most likely treat a BOM as an error.
This archive was generated by hypermail 2.1.5 : Wed Jan 19 2005 - 17:52:44 CST