From: Kent Karlsson (email@example.com)
Date: Tue Mar 25 2003 - 06:14:59 EST
> You command above would now expand to something like this:
> cat -R UTF-16 -F UTF-16LE file1 -F Big-5 file2 > file3
> Provided with information about the input encodings and the
> expected output
> encoding, "cat" could now correctly handle BOM's, endianness, new-line
> conventions, and even perform character set conversions.
> Without this extra
> info, "cat" would retain its good ol' byte-by-byte functionality.
> Similar options could be added to any Unix command
> potentially dealing with
> text files ("cp", "head", "tail", etc.), as well as to their
> equivalents in
> DOS or other operating systems.
To avoid "flag bloat", one can instead use the "iconv" command,
and apply that to the source files. Since "head" and "tail" assumes
an ASCII compatible singlebyte or multibyte encoding, where any
state is reset at LF, the target encoding for the iconv command
must, for those commands, be such an encoding.
This archive was generated by hypermail 2.1.5 : Tue Mar 25 2003 - 07:09:03 EST