Re: Unicode and end users

From: Juliusz Chroboczek (jec@dcs.ed.ac.uk)
Date: Thu Feb 14 2002 - 10:57:34 EST


MK> What we are trying to establish is the exact meaning that UNICODE
MK> ought to have - that is, if it can have one at all.

In the Unix-like world, the term ``UTF-8'' has been used quite
consistently, and most documentation avoids using Unicode for a disk
format (using it for the consortium, er., the Consortium, the
character repertoire and, when useful, for the coded character set).

The Unix-like public is used to thinking of UTF-8 as the format in
which Unicode text is saved on disk, and ``UTF-8 (Unicode)'' or
perhaps ``Unicode (UTF-8)'' should be the preferred user-interface
item.

MK> Are there, in fact, many circumstances in which it is necessary
MK> for an end user to create files that do *not* have a BOM at the
MK> beginning?

You should never use either BOMs or UTF-16 on Unix-like systems; using
either will break too much of the system.

                                        Juliusz



This archive was generated by hypermail 2.1.2 : Thu Feb 14 2002 - 10:34:51 EST