Re: UTF-8 and POSIX locales

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Thu Jun 24 1999 - 05:17:51 EDT


Karlsson Kent - keka wrote on 1999-06-24 08:24 UTC:
> It would be better to always use the data found in
> the 'Unicode character database' file, rather than that found
> in other lists of character properties, lists that are not kept
> as up-to-date nor are as keenly reviewed. And even if they
> so were, might still arbitrarily, and implicitly, diverge from
> Unicode's data.

Well, it should not be too difficult - I hope - to automatically
generate one file from the other. If someone provides me access to the
specification, I might find the time to write a small Perl script that
does exactly that fully automatically and repeatable.

What I do not like at the moment about the locale mechanism is that it
ties together the character encoding and the cultural conventions, which
I think are two completely orthogonal things. Even worse, there are some
systems out there that provide UTF-8 locales only tied to a cultural
convention set, in the worst case they provide only en_US.UTF-8. So to
get UTF-8, I would also have to accept the strange US date/time
notations and in some programs even default settings for non-metric
units and strange US paper sizes, US-specific terminology such as "ZIP
code" instead of "postal code", etc., all derived from the "en_US" part
of the locale name.

The standard should specify the name of some default locales that do
specify an encoding, but that otherwise copy just the cultural
conventions of the C or POSIX default locales. The names of these
locales could for instance be

   POSIX.UTF-8
   POSIX.ISO_8859-1

etc., or perhaps even better just

  UTF-8
  ISO_8859-1

etc. I also like the idea of a standard locale named "ISO.UTF-8" or
"international" that uses UTF-8 and fills in other cultural conventions
according to ISO standards, e.g. ISO 8601 for the date/time notation
and ISO 31 for the formatting of monetary units (currency appended with
a space behind the number, just like any SI unit).

Markus

P.S.: For the new single currency that will combine around 2038 the USD,
JPY, EUR, etc. into a global monetary union, I'd like to suggest the name
"iso". ;-)

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT