Re: Character encoding at the prompt

From: Jungshik Shin (jshin@mailaps.org)
Date: Thu Oct 25 2001 - 19:14:55 EDT


Richard, Francois M wrote:

> As a follow-up on this interesting issue, I did the following testing on
> Solaris 2.6:
>

>>setenv LC_ALL en_US.UTF-8
>>env LC_ALL=it echo
>>
> giovedì, 25 ottobre 2001, 11:45:24 EDT
>
> I could not understand why I get the display of the letter ì in the
> en_US.UTF-8 Locale. My understanding was that the date command was
> generating the message in the Italian locale (default encoding iso-8859-1)
> and as a result ì would be encoded as xEC. The display should be done in the
> en_US.UTF-8 Locale and be an invalid byte sequence.

    Where did you conduct this little experiment, at
console or a terminal (xterm, dtterm, etc) under X?
If it's the former, I don't know what you have to do
to change the encoding of the console. Just setting
LC_ALL does not affect any process that's already
running. Under X, you can do the following:

   % env LC_ALL=en_US.UTF-8 dtterm

  Then in a new dtterm, do whatever you want to do.

> The other question is related to Locale setting:
> What is the difference between LC_ALL and LANG and how these variables are
> used by the OS. In particular, cannot see any impact on the OS when LANG is
> changed.

   1. If LC_ALL is defined, LC_ALL overrides LANG and any LC_*.
   2. If LC_ALL is not defined but LANG is defined,
      a. LANG is used for the locale category XXX for which
         LC_XXX is not defined.
      b. For YYY locale category for which LC_YYY is defined,
         LC_YYY is used.

> What does the encoding part of the Locale impact? Does it mean that any
> characters processed by the OS are going to be interpreted according to this
> encoding? What are some practical examples of this impact?

    You've already observed a difference. The encoding(codeset) of your
locale is used in the output of cmd line programs
(as you expected, the output of 'date' in it_IT.ISO8859-1 is in
ISO-8859-1 and that of 'date' in it_IT.UTF-8 is in UTF-8) C lib.
functions like mbs(r)towcs, wcs(r)tombs, mb(r)towc, wc(r)tomb, mblen
(and iconv() in some implementations) are also affected by the
encoding/codeset of your present locale. As I wrote above,
programs like dtterm and many others running under X11(+CDE/Openwin)
behave differently as well (different fonts might be used and
the repertoire of characters that can be handled by applications
are different. Besides, some input method servers and output method may
only work with a particular encoding/codeset.

   If you find the manual pages of Solaris 2.6 not sufficient,
you can also look up Single Unix 98 spec available at
http://www.opengroup.org

   Jungshik Shin



This archive was generated by hypermail 2.1.2 : Thu Oct 25 2001 - 20:22:50 EDT