Re: UTF-8 'BOM' (was RE: Subject: Re: 32'nd bit & UTF-8)

From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Fri Jan 21 2005 - 13:12:03 CST

Next message: Richard T. Gillam: "RE: Conformance (was UTF, BOM, etc)"

Previous message: Richard T. Gillam: "RE: Byte-oriented lexer generator for Unicode"
In reply to: Lars Kristan: "RE: UTF-8 'BOM' (was RE: Subject: Re: 32'nd bit & UTF-8)"
Next in thread: Lars Kristan: "RE: UTF-8 'BOM' (was RE: Subject: Re: 32'nd bit & UTF-8)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

RE: UTF-8 'BOM' (was RE: Subject: Re: 32'nd bit & UTF-8)OK, looks like I
also was too terse.

I wrote:
>> fopen is normally text ("w"), binary mode ("wb") is rare and even
>> then identical to text.

First and third part we know each other what is about. You described at
length.

When I wrote:
>> binary mode ("wb") is rare

I intended to highlight that the intent of this "b" flag is often lost in
programs from *nix heritage. And this is a problem.
Granted, if the source never left the *nix world there is no actual problem.
But it is when you try to port it outside that problems surge. Particularly
when the same file is written "w" and read "rb", by the same set of programs
(yes, it happens).

Windows is probably a necessary evil, but this is not an excuse to take the
short track and do the things in a way that only raise problems to the
others.

Of course, this is not against you, there is nothing personnal here.

Lars Kristan answered:
> BTW (yes, again and again): This is something Windows is not able
> to achieve.

What do you mean (simple curiosity, I did not get your point)?
Yes Windows does extravagant contortions about codepages with filenames (and
this is disappearing, fortunately), but that should be irrelevant. Yes, C
programs on Windows "eat" CR.
But I fail to see sigificant examples beyond those (I consider the "feature"
of BOM at the beginning of Notepad/RichEdit/whatever UTF-8 files to be an
outright bugbug that had been missed in due time and is now entrenched).

> But that does not mean no Unicode application is able
> to do it. Application that processes text in UTF-8 is also able to
> do it. UTF-16 applications on the other hand are not.

Sorry: you mean, a Unicode application is able to "absorb" any stream
(including erroneous encodings) when programmed in UTF-8 while unable when
programmed in UTF-16?
I would have expected just the reverse (because of the requirements for the
illegality of the overlong encodings).

Antoine

Next message: Richard T. Gillam: "RE: Conformance (was UTF, BOM, etc)"
Previous message: Richard T. Gillam: "RE: Byte-oriented lexer generator for Unicode"
In reply to: Lars Kristan: "RE: UTF-8 'BOM' (was RE: Subject: Re: 32'nd bit & UTF-8)"
Next in thread: Lars Kristan: "RE: UTF-8 'BOM' (was RE: Subject: Re: 32'nd bit & UTF-8)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 13:16:52 CST