RE: Several BOMs in the same file

From: Kent Karlsson (kentk@md.chalmers.se)
Date: Tue Mar 25 2003 - 06:14:59 EST

  • Next message: Pim Blokland: "Re: Several BOMs in the same file"

    > You command above would now expand to something like this:
    >
    > cat -R UTF-16 -F UTF-16LE file1 -F Big-5 file2 > file3
    >
    > Provided with information about the input encodings and the
    > expected output
    > encoding, "cat" could now correctly handle BOM's, endianness, new-line
    > conventions, and even perform character set conversions.
    > Without this extra
    > info, "cat" would retain its good ol' byte-by-byte functionality.
    >
    > Similar options could be added to any Unix command
    > potentially dealing with
    > text files ("cp", "head", "tail", etc.), as well as to their
    > equivalents in
    > DOS or other operating systems.

    To avoid "flag bloat", one can instead use the "iconv" command,
    and apply that to the source files. Since "head" and "tail" assumes
    an ASCII compatible singlebyte or multibyte encoding, where any
    state is reset at LF, the target encoding for the iconv command
    must, for those commands, be such an encoding.

            /kent k



    This archive was generated by hypermail 2.1.5 : Tue Mar 25 2003 - 07:09:03 EST