Re: Subject: Re: 32'nd bit & UTF-8

From: Hans Aberg (
Date: Wed Jan 19 2005 - 17:51:30 CST

  • Next message: Eric Muller: "Re: Forms for invisible ZWJ (and ZWNJ)"

    At 21:09 +0100 2005/01/19, Marcin 'Qrczak' Kowalczyk wrote:
    >> On the very contrary. It's most helpful to determine a text file's
    >> encoding. Without the UTF8 BOM it's hard to tell whether a file is
    >> encoded in some ISO or whatever encoding/codepage or is already UTF8.
    >The problem with BOM in UTF8 is that it must be specially handled by
    >all applications. It effectively turns UTF-8 into a stateful encoding
    >where the beginning of a "text stream" must be treated specially.
    >World would be simpler if UTF-8 BOM was banned.
    >Fortunately I have never met a Unix program which used a UTF-8 BOM,
    >so I can mostly ignore the issue, except that text files coming from
    >Windows may have that annoying thing at the beginning which must be

    The main point is that BOM will not be specially treated in the UNIX world,
    regardless what Unicode says. So I guess MS does not want its text files to
    be read in the UNIX world. Unicode has made the mistake of favoring a
    special platform over all the others.

      Hans Aberg

    This archive was generated by hypermail 2.1.5 : Wed Jan 19 2005 - 17:53:25 CST