RE: Several BOMs in the same file

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Tue Mar 25 2003 - 13:38:52 EST

  • Next message: Frank da Cruz: "Re: Detecting UTF-8 Locale Question"

    Kent Karlsson wrote:
    > > I'm not going into the implementation part; just pointing out that
    > > this issue is not something an operating system can ignore.
    >
    > "cat" and "cp" can and shall ignore it. They are octet-level
    > file operations, attaching no semantics to the octets. Try "iconv".

    This byte-level operation is the just the default behavior. This basic
    behavior should remain the default, of course.

    However, there already are a lot of options specific to text files, that
    *do* attach character semantics to octets, such as the "-n" option to number
    output lines:

            http://www.hmug.org/man/1/cat.html

    As a minimum, option "-v" must know the semantics of NL and LF control
    codes, of the digits, and the of white space.

    There is no technical reason for not adding more options to act more
    sensibly with the encoding(s) of the involved text file(s). Again, any such
    text-specific option must be disabled by default, in order to preserve the
    basic byte-by-byte operation.

    _ Marco



    This archive was generated by hypermail 2.1.5 : Tue Mar 25 2003 - 14:27:46 EST