Re: Subject: Re: 32'nd bit & UTF-8

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Wed Jan 19 2005 - 14:09:41 CST

Next message: Marcin 'Qrczak' Kowalczyk: "Re: 32'nd bit & UTF-8"

Previous message: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
In reply to: Oliver Christ: "RE: Subject: Re: 32'nd bit & UTF-8"
Next in thread: Peter Kirk: "Re: Subject: Re: 32'nd bit & UTF-8"
Reply: Peter Kirk: "Re: Subject: Re: 32'nd bit & UTF-8"
Reply: Mark E. Shoulson: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

"Oliver Christ" <oli@trados.com> writes:

> On the very contrary. It's most helpful to determine a text file's
> encoding. Without the UTF8 BOM it's hard to tell whether a file is
> encoded in some ISO or whatever encoding/codepage or is already UTF8.

The problem with BOM in UTF8 is that it must be specially handled by
all applications. It effectively turns UTF-8 into a stateful encoding
where the beginning of a "text stream" must be treated specially.
World would be simpler if UTF-8 BOM was banned.

Fortunately I have never met a Unix program which used a UTF-8 BOM,
so I can mostly ignore the issue, except that text files coming from
Windows may have that annoying thing at the beginning which must be
stripped.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

Next message: Marcin 'Qrczak' Kowalczyk: "Re: 32'nd bit & UTF-8"
Previous message: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
In reply to: Oliver Christ: "RE: Subject: Re: 32'nd bit & UTF-8"
Next in thread: Peter Kirk: "Re: Subject: Re: 32'nd bit & UTF-8"
Reply: Peter Kirk: "Re: Subject: Re: 32'nd bit & UTF-8"
Reply: Mark E. Shoulson: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jan 19 2005 - 14:10:39 CST