Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
|2012/7/13 Steven Atreju <snatreju_at_googlemail.com>:
|> Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
|>
|> |2012/7/12 Steven Atreju <snatreju_at_googlemail.com>:
|> |> UTF-8 is a bytestream, not multioctet(/multisequence).
|> |Not even. UTF-8 is a text-stream, not made of arbitrary sequences of
|> |bytes. It has a lot of internal semantics and constraints.
|> |The effective binary encoding of text streams should NOT play any
|> |semantic role (all UTFs should completely be equivalent on the text
|> |interface, the bytestream low level is definitely not suitable for
|> |handling text and should not play any role in any text parser or
|> |collator).
|>
|> I don't understand what you are saying here.
|> UTF-8 is a data interchange format, a text-encoding.
|> It is not a filetype!
|
|Not only ! It is a format which is unambiguously bound to a text
|filetype, even if this file type may not be intended to be interpreted
|by humans (e.g. program sources or riche text formats like HTML)
|
|> A BOM is a byte-order-mark, used to signal different host endianesses.[...]
|
|I'm on this list since long enough to know all this already. And i've
|not contradicted this role. However this is not prescriptive for
Sure, i know the former and i bet there has been a lot of discussion.
|anything else than text file types (whatever they are). For example
|BOMs have abolutely no role for encoding binary images, even if they
|include internal multibyte numeric fields.
Well, it boils down to that, does it. If Unicode *defines* that
the so-called BOM is in fact a Unicode-indicating tag that MUST
be present, then it is very clear what has to happen for, say,
'$ cat tagless tagged > out' (in an UTF-8 environment). I don't
agree with that though due to the reasons i tried to put in
english words, but this is solely my problem. Another approach
would be an explicit UTF-8-BOM charset. Or, of course,
deprecating the -BE/-LE versions.
I don't agree with just about anything you say about automatic
metadata provision. I know that, in Germany, many, many small
libraries become closed because there is not enough money
available to keep up with the digital race, and even the greater
*do* have problems to stay in touch! I've mentioned bitsavers
already, but this is a drop in the bucket, almost rhetoric. In
other countries the situation is worse.
Steven
Received on Fri Jul 13 2012 - 15:41:01 CDT
This archive was generated by hypermail 2.2.0 : Fri Jul 13 2012 - 15:41:01 CDT