Re: BOM's at Beginning of Web Pages?

From: Frank da Cruz (fdc@columbia.edu)
Date: Mon Feb 17 2003 - 09:59:05 EST

  • Next message: Roozbeh Pournader: "Bidi overrides and chocolate paper"

    On Mon, 17 Feb 2003 08:13:51 -0500 (EST),
    Jungshik Shin <jshin@mailaps.org> wrote:

    > Incidentally, it just occurred to me that ftp/ssh clients may offer an
    > user-configurable option for the automatic removal of 'UTF-8 BOM' at
    > the beginning of a text file in UTF-8 when moving files from Windows to
    > non-Windows platforms (Unix/Unix-like OS and MacOS). The same is true
    > of Kermit (Frank, are you here?).
    >
    Yes, Kermit does this in both Kermit and FTP protocol (for those who hadn't
    heard, Kermit is now also a Unicode-aware FTP client):

      http://www.columbia.edu/kermit/ftpclient.html

    > All those tools can be configured
    > to translate between three (and nowadays even more?) EOL conventions,
    > CF/LF/CR,LF for text files.
    >
    Kermit on a particular platform understands the text record format of
    that platform (CR, LF, CRLF, 80-column card images, length fields, etc)
    and converts between it and the standard transfer format, i.e. lines
    terminated by CRLF. Thus it converts between all combinations of record
    formats among all the platforms where it runs as a fundamental aspect of
    text-mode file transfer. This applies to both Kermit and FTP transfers.

    > Then, the automatic removal(and addition if
    > that's regarded as necessary) of UTF-8 BOM at platform boundaries
    > would be as useful.
    >
    Kermit's BOM removal occurs (or not, as desired) on a per-file basis
    (not per record). Kermit's Unicode features are described here:

      http://www.columbia.edu/kermit/ckermit70.html#x6.6

    and (for the FTP client) here:

      http://www.columbia.edu/kermit/ckermit80.html#x3.7.1

    For those who don't know, Kermit converts not only record format but also
    character sets using the same technique: local set -> standard intermediate
    set -> remote set. The repertoire of character sets it knows about
    depends on the platform, but is likely to include PC and Windows code
    pages, various other corporate sets such as HP-Roman8, ISO 646 and 8859
    sets, the many KOI and JIS variations, and different forms of Unicode.

    This occurs during text-mode file transfer as well as online terminal
    connections (serial, modem, telnet, ssh, etc).

    Kermit also has a TRANSLATE command for converting the character sets
    of local text files, and this too can add or remove BOMs at the user's
    discretion.

    - Frank



    This archive was generated by hypermail 2.1.5 : Mon Feb 17 2003 - 10:44:45 EST