From: Lars Kristan (lars.kristan@hermes.si)
Date: Wed Jan 19 2005 - 14:15:18 CST
Hans Aberg wrote:
> On 2005/01/19 01:56, Peter Kirk at peterkirk@qaya.org wrote:
>
> > On 19/01/2005 00:09, Hans Aberg wrote:
> >> UTF-8 BOM's seem pointless.
>
> > Maybe. Nevertheless, they exist, not only as a result of
> unintelligent
> > conversion from UTF-16 or UTF-32 to UTF-8, but also because
> at least one
> > UTF-8 editor, Notepad on Windows 2000 (and XP?), always
> emits a BOM at
> > the start of a UTF-8 file.
>
> Well, it seems easier to change that single editor, then. Or
> write a program
> that removes it at need.
At first, one would think that the UTF-8 'BOM' emitted by Notepad is an
oversight, a bug. But that is not the case.
A long time ago, Notepad worked on 8-bit legacy encoded files. Always in
your current Windows codepage.
Then Notepad was rewritten in Unicode and got the ability to save files in
'Unicode' (UCS-2). When opening a file, it used the BOM to distinguish the
two flavors of text files.
Now Notepad got the ability to save UTF-8 files. And the UTF-8 'BOM' is
emitted for the same purpose - to be able to distinguish the UTF-8 files
from legacy encoded files. So, you always get the text you saved back,
displayed properly. But yes, you cannot use Notepad to edit UNIX files, or
UTF-8 html files.
It's a question of what Notepad is - is it a plain text editor or is it an
editor for "Text documents"? From Microsoft perspective it's probably the
latter, since Windows practically doesn't have any text files at all. Except
those generated as "Text documents". For everything else (like html), you
have tools.
Not that I agree with that approach or like the consequences, but that is
what they probably had in mind.
Lars
This archive was generated by hypermail 2.1.5 : Wed Jan 19 2005 - 14:16:08 CST