Re: BOM's at Beginning of Web Pages?

From: Jonathan Coxhead (jonathan@doves.demon.co.uk)
Date: Tue Feb 18 2003 - 16:14:15 EST

  • Next message: Markus Scherer: "Re: RFC, 5-6 octets sequence in UTF8, non short form in UTF8"

       That's a very long-winded way of writing it!

       How about this:

          #!/usr/bin/perl -pi~ -0777
          # program to remove a leading UTF-8 BOM from a file
          # works both STDIN -> STDOUT and on the spot (with filename as argument)
          s/^\xEF\xBB\xBF//s;

    which uses perl's -p, -i and -0 options to the same effect.

       On 17 Feb 2003, at 17:36, Martin Duerst wrote:

    > #!/usr/bin/perl
    >
    > # program to remove a leading UTF-8 BOM from a file
    > # works both STDIN -> STDOUT and on the spot (with filename as argument)
    >
    > if ($#ARGV > 0) {
    > print STDERR "Too many arguments!\n";
    > exit;
    > }
    >
    > my @file; # file content
    > my $lineno = 0;
    >
    > my $filename = $ARGV[0];
    > if ($filename) {
    > open BOMFILE, "$filename";
    > while (<BOMFILE>) {
    > if (!$lineno++) {
    > s/^\xEF\xBB\xBF//;
    > }
    > push @file, $_ ;
    > }
    > close BOMFILE;
    > open NOBOMFILE, ">$filename";
    > foreach $line (@file) {
    > print NOBOMFILE $line;
    > }
    > close NOBOMFILE;
    > }
    > else { # STDIN -> STDOUT
    > while (<>) {
    > if (!$lineno++) {
    > s/^\xEF\xBB\xBF//;
    > }
    > push @file, $_ ;
    > }
    > foreach $line (@file) {
    > print $line;
    > }
    > }

            /|
     o o o (_|/
            /|
           (_/



    This archive was generated by hypermail 2.1.5 : Tue Feb 18 2003 - 16:52:32 EST