From: Pim Blokland (pblokland@planet.nl)
Date: Sun Mar 23 2003 - 08:43:46 EST
> in MS-DOS, file3 will have the following contents:
>
> BOM
> contents from file1
> BOM
> contents from file2
>
> Is this in accordance with the Unicode standard
Nope. When concatenating two files (or any streams) of which the
second one has a BOM, the second one should be deleted.
However, there's a rule which states that if a U+FEFF character
appears in the middle of a file, it should be treated as a zero
width no-break space, that is, identical to a zero width word joiner
(U+2060). So it's not as big as a problem as it may look.
But now you've got me wondering whether there are any rules or
guidelines for the situation where two files are joined, and the
second one has a BOM, but the first one hasn't. Should the resulting
file have a BOM? I.E. should a BOM be added to what was the contents
of the first file?
Pim Blokland
This archive was generated by hypermail 2.1.5 : Sun Mar 23 2003 - 09:23:24 EST