Re: Several BOMs in the same file

From: Pim Blokland (pblokland@planet.nl)
Date: Sun Mar 23 2003 - 08:43:46 EST

  • Next message: Michael \(michka\) Kaplan: "Re: Several BOMs in the same file"

    > in MS-DOS, file3 will have the following contents:
    >
    > BOM
    > contents from file1
    > BOM
    > contents from file2
    >
    > Is this in accordance with the Unicode standard

    Nope. When concatenating two files (or any streams) of which the
    second one has a BOM, the second one should be deleted.
    However, there's a rule which states that if a U+FEFF character
    appears in the middle of a file, it should be treated as a zero
    width no-break space, that is, identical to a zero width word joiner
    (U+2060). So it's not as big as a problem as it may look.

    But now you've got me wondering whether there are any rules or
    guidelines for the situation where two files are joined, and the
    second one has a BOM, but the first one hasn't. Should the resulting
    file have a BOM? I.E. should a BOM be added to what was the contents
    of the first file?

    Pim Blokland



    This archive was generated by hypermail 2.1.5 : Sun Mar 23 2003 - 09:23:24 EST