From: Lars Kristan (lars.kristan@hermes.si)
Date: Thu Jan 20 2005 - 09:36:08 CST
Arcane Jill wrote:
> > You are drawing this analogue too far, because it is fairly
> easy to fix the
> > \r\n problem, whereas the BOM problem runs deeper. The
> latter changes the
> > very paradigm for file representation.
>
> I don't see why. What is the difference between discarding
> U+000Ds and
> discarding U+FEFFs ?
There is some difference. You can concat two files containing U+000Ds,
blindly. You shouldn't do that for leading U+FEFFs. Then, a text processing
process can drop the U+000Ds quite safely, knowing exactly what they
represent. Dropping three consecutive bytes is another story. Especially
since at the time you process, you might not even know if this is the
beginning of a file or not (say, processing the output of a grep command).
The analogy between CRLF and BOM is just in the location where it needs to
be fixed. Probably. More or less. But fixing CRLF is easier. And often you
can get away with not fixing it at all.
Lars
This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 09:37:04 CST