RE: UTF-8 'BOM' (was RE: Subject: Re: 32'nd bit & UTF-8)

From: Lars Kristan (lars.kristan@hermes.si)
Date: Fri Jan 21 2005 - 16:41:27 CST

Next message: Jon Hanna: "RE: So how about U+D7FD for a NOP then?"

Previous message: Lars Kristan: "RE: Subject: Re: 32'nd bit & UTF-8"
Maybe in reply to: Lars Kristan: "UTF-8 'BOM' (was RE: Subject: Re: 32'nd bit & UTF-8)"
Next in thread: Mark Leisher: "The "JDGI" file grows [was re: UTF-8, BOM, 32'nd bit]"
Reply: Mark Leisher: "The "JDGI" file grows [was re: UTF-8, BOM, 32'nd bit]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Andy Heninger wrote:

> From a USER'S PERSPECTIVE?

In this case, I am the user, since I use the C language to write software.
Sorry for the ambiguity in my response.

> Text files should be opened in text mode;
> binary files should be opened in binary mode.
> So says the applicable standards.

I don't know much about the standards, but I suspect the standards are not
presribing how to open files. They simply define _standard_ ways of doing it
and _standard_ ways of specifying what to do.

It is a pity one needs to decide on the type of the file before opening it.
Apart from the extension (which is very unreliable), and application's
expectations, there is no way to tell what the file really contains. Only
when you open it can you start determining what it is. Sometimes there is a
solution for that, but not always. And even when there is one, it is
typically costly.

Then there are other problems. You could argue perhaps that it is the
application's expectation that counts. Well, I've wasted a lot of paper and
time whenever I forgot to specify the /b in the copy command directed to the
LPT. And it is just an example. There are many other similar problems. So
many that I've started to like UNIX, even though I grew up with Microsoft.

So, your philosopy is to distinguish text and binary data. Someone else's
philosophy is to not do so. And they both work. And the two of you should
agree that you disagree, but should both be given an equal chance to learn
whether you're right or not.

And this is where the Unicode standard is right. It allows the BOM in UTF-8
but does not prescribe it. Where the UTC is not right is ... oh well, I've
said it too many times already.

Lars

Next message: Jon Hanna: "RE: So how about U+D7FD for a NOP then?"
Previous message: Lars Kristan: "RE: Subject: Re: 32'nd bit & UTF-8"
Maybe in reply to: Lars Kristan: "UTF-8 'BOM' (was RE: Subject: Re: 32'nd bit & UTF-8)"
Next in thread: Mark Leisher: "The "JDGI" file grows [was re: UTF-8, BOM, 32'nd bit]"
Reply: Mark Leisher: "The "JDGI" file grows [was re: UTF-8, BOM, 32'nd bit]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 16:46:19 CST