Re: UTF-8 BOM (Re: Charset declaration in HTML) from Steven Atreju on 2012-07-13 (Unicode Mail List Archive)

From: Steven Atreju <snatreju_at_googlemail.com>
Date: Fri, 13 Jul 2012 22:38:45 +0200

Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:

Sure, i know the former and i bet there has been a lot of discussion.

|anything else than text file types (whatever they are). For example
|BOMs have abolutely no role for encoding binary images, even if they
|include internal multibyte numeric fields.

Well, it boils down to that, does it. If Unicode *defines* that
the so-called BOM is in fact a Unicode-indicating tag that MUST
be present, then it is very clear what has to happen for, say,
'$ cat tagless tagged > out' (in an UTF-8 environment). I don't
agree with that though due to the reasons i tried to put in
english words, but this is solely my problem. Another approach
would be an explicit UTF-8-BOM charset. Or, of course,
deprecating the -BE/-LE versions.

I don't agree with just about anything you say about automatic
metadata provision. I know that, in Germany, many, many small
libraries become closed because there is not enough money
available to keep up with the digital race, and even the greater
*do* have problems to stay in touch! I've mentioned bitsavers
already, but this is a drop in the bucket, almost rhetoric. In
other countries the situation is worse.

Steven
Received on Fri Jul 13 2012 - 15:41:01 CDT

This archive was generated by hypermail 2.2.0 : Fri Jul 13 2012 - 15:41:01 CDT