Re: Subject: Re: 32'nd bit & UTF-8

From: Peter Kirk (peterkirk@qaya.org)
Date: Thu Jan 20 2005 - 19:15:33 CST

Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"

Previous message: Adam Twardoch: "Re: Subject: Re: 32'nd bit & UTF-8"
In reply to: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Next in thread: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Reply: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 20/01/2005 20:46, Hans Aberg wrote:

> ...
>
>It is not my claim, but some posters originally said that the reason for
>requiring the BOM in UTF-8 processes a MS text editor that always stamped
>BOM's onto UTF-8 files.
>
>If you know the correct answer of these things, why don't you enlighten
>these other posters so that this discussion terminates? After all, requiring
>BOM's in UTF-8 data is really stupid, so it must be interesting to get to
>know what moron introduced it.
>
>
>
I agree, Hans. This requirement would be really stupid. But you are the
person who introduced it. In the light of this perhaps you might like to
reconsider the word "moron".

Possibly you imagined this requirement because you misundersood what I
wrote. I wrote something like that a process reading a UTF-8 stream was
obliged to recognise a BOM as such, and not as U+FEFF, because that is
what the Unicode standard seems to say although it leaves some room for
interpretation. I never suggested that a UTF-8 stream was required to
start with a BOM, as this is clearly untrue - in fact the Unicode
standard explicitly recommends against this Microsoft practice in most
circumstances.

One minute later, Hans Aberg wrote:

>There is at least an informat notion of a plain text file. And that is UTF-8
>without a BOM, I feel sure.
>

Well, however sure you are, you are wrong. The Unicode standard
specifies that a BOM may optionally appear at the start of a string of
UTF-8 characters whose encoding is not otherwise specified. (Note
carefully the "optionally".) This must include the start of a plain text
file, at least where as in the Unix world there is no out-of-band
information about its encoding.

...

>Posters said originally that it came from a MS text editor that always
>stamps BOM's onto files.
>

I think you are again misunderstanding something I wrote. I was I think
the first to mention that a MS text editor emits BOMs at the start of
UTF-8 files. But neither I nor I think anyone else except you has said
that this format was originated by MS. I suggested the opposite, that MS
took this format from the standard as it already existed.

...

>The UTF-8 without BOM's is already taking off. But formally, in the eyes of
>Unicode, that is a corrupted UTF-8.
>

Not true. Read the standard. Or just read the extract I quoted to you.
Or the extracts which Ken has just posted.

...

>As I mentioend before, this is what other posters said. Go to them for
>proof.
>

Please tell me who wrote such lies and where. Look in the archives of
this list. I think what has really happened is that in your enthusiasm
to reply to every posting on this list you haven't bothered to read
properly and understand what you are replying to. You have made
something like 40 postings in 24 hours. Is this a record?

In another message, Hans Aberg wrote, replying to Rick McGowan:

>Hmmm... I don't recall that the Unicode Standard ever specifies that the
>> Byte Order Mark is *required* to be used anywhere for any purpose. Can you
>> point me to the place in the standard where this is stated?
>
>
>
>Several poster have cliamed that, most recently Arcane Jill.
>
>

No, she did not, she wrote precisely the opposite with special emphasis:

> Unicode does NOT require that all UTF-8 text files must begin with a BOM

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/
-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.7.0 - Release Date: 17/01/2005

Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Previous message: Adam Twardoch: "Re: Subject: Re: 32'nd bit & UTF-8"
In reply to: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Next in thread: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Reply: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 20:12:02 CST