Re: UTF-8 BOM (Re: Charset declaration in HTML)

From: Martin J. Dürst <duerst_at_it.aoyama.ac.jp>
Date: Wed, 18 Jul 2012 10:47:03 +0900

Hello Philippe,

On 2012/07/18 3:37, Philippe Verdy wrote:
> 2012/7/17 Julian Bradfield<jcb+unicode_at_inf.ed.ac.uk>:
>> On 2012-07-16, Philippe Verdy<verdy_p_at_wanadoo.fr> wrote:
>>> I am also convinced that even Shell interpreters on Linux/Unix should
>>> recognize and accept the leading BOM before the hash/bang starting
>>> line (which is commonly used for filetype identification and runtime
>> The kernel doesn't know or care about character sets. It has a little
>> knowledge of ASCII (or possibly EBCDIC) hardwired, but otherwise it deals
>> with 8-bit bytes. It has no concept of "text file".
>
> Yes I know. But most tools and script should know on which type of
> file they are operating on. Unfortunately the tools are as well
> agnostic and just rely on things that do not pass the transport
> protocols. Such as filename conventions.

Just writing that you are convinced about something a shell should do
doesn't change anything. Maybe you can create a patch (or a few patches,
because there quite a few tools out there in the Linux/Unix world) and
see if you can convince the respective maintainers that it's indeed a
good idea.

[As others with some amount of Linux/Unix background, I strongly doubt
that for Linux/Unix, the BOM is a good idea.]

Regards, Martin.
Received on Tue Jul 17 2012 - 20:50:45 CDT

This archive was generated by hypermail 2.2.0 : Tue Jul 17 2012 - 20:50:46 CDT