RE: Character encoding at the prompt

From: Richard, Francois M (Francois.M.Richard@usa.xerox.com)
Date: Thu Oct 25 2001 - 12:34:12 EDT

Previous message: Darren Morby: "Re: Letters d L l and t with caron"
Maybe in reply to: Tay, William: "Character encoding at the prompt"
Next in thread: David Starner: "Re: Character encoding at the prompt"
Next in thread: Yves Arrouye: "RE: Character encoding at the prompt"
Reply: David Starner: "Re: Character encoding at the prompt"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

As a follow-up on this interesting issue, I did the following testing on
Solaris 2.6:

>setenv LC_ALL en_US
>env LC_ALL=it.UTF-8 date
giovedÃŦ, 25 ottobre 2001, 11:45:24 EDT

This worked properly since Thursday is actually: giovedė in Italian and ė is
U+00EC encoded as C3 AC (hexa) in Utf-8.
These two bytes are intrepreted as iso-8859-1 encoding (default in Solaris)
and as a result are displayed as ÃŦ

But:
>setenv LC_ALL en_US.UTF-8
>env LC_ALL=it echo
giovedė, 25 ottobre 2001, 11:45:24 EDT

I could not understand why I get the display of the letter ė in the
en_US.UTF-8 Locale. My understanding was that the date command was
generating the message in the Italian locale (default encoding iso-8859-1)
and as a result ė would be encoded as xEC. The display should be done in the
en_US.UTF-8 Locale and be an invalid byte sequence.

The other question is related to Locale setting:
What is the difference between LC_ALL and LANG and how these variables are
used by the OS. In particular, cannot see any impact on the OS when LANG is
changed.

What does the encoding part of the Locale impact? Does it mean that any
characters processed by the OS are going to be interpreted according to this
encoding? What are some practical examples of this impact?

Franįois
> -----Original Message-----
> From: Addison Phillips [wM] [mailto:aphillips@webmethods.com]
> Sent: Wednesday, October 24, 2001 6:18 PM
> To: Tay, William; unicode@unicode.org
> Subject: RE: Character encoding at the prompt
>
>
> Hi William,
>
> The answer is that it depends on the current user locale.
>
> Generally, Western European languages in Windows use Code
> Page 1252 for GUI
> displays and either Code Page 437 (US English) or Code Page
> 850 for "dos
> boxes" (the "cmd" prompt). On Windows NT this can be changed
> manually with
> the "chcp" command. Changing your actual system locale
> ("Regional Options")
> will also change the windows and command line code pages as
> appropriate.
> Fair warning: do NOT experiment with Asian locales on
> European builds of NT
> 4.0 systems (that you care about). In "Microsoft-ese", the
> Windows code
> page is the ANSI code page and the command line is the OEM
> code page. In
> this case, ANSI has nothing to do with the standards
> organization or any
> particular encoding---it's just a name to differentiate the
> code page from
> the OEM flavor. There is documentation on the MS website that I am too
> pressed for time to lookup the URL for.....
>
> On most UNIX-like operating systems, the current locale controls the
> encoding. In fact, the encoding is part of the locale name. Generally
> Western European languages use ISO-8859-1 (aka Latin-1).
> Solaris 2.7 and
> especially 2.8 add support for nifty new encodings (including UTF-8, a
> Unicode encoding). If you type "locale" at the shell prompt,
> you will see a
> listing of your various locale settings, which will include
> the current
> encoding. Unlike Windows, the locale (and thus encoding) apply to both
> command line and GUI interfaces. Also unlike Windows, the
> locale setting is
> process specific. Child processes inherit the parent's
> environment, so if
> you change your locale and then launch a GUI program, that
> program will have
> a matching locale. Of course, this is a generalization.....
>
> Don't forget that file systems and shells have a part to play in your
> command line excursions.
>
> Hope this helps.
>
> Addison
>
> Addison P. Phillips
> Globalization Architect / Manager, Globalization Engineering
> webMethods, Inc. 432 Lakeside Drive, Sunnyvale, CA
> +1 408.962.5487 (phone) +1 408.210.3569 (mobile)
> -------------------------------------------------
> Internationalization is an architecture. It is not a feature.
>
> -----Original Message-----
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
> Behalf Of Tay, William
> Sent: Wednesday, October 24, 2001 5:08 PM
> To: unicode@unicode.org
> Subject: Character encoding at the prompt
>
>
> Hi,
>
> Do you have any idea what is the default code page and
> encoding scheme for
> MS DOS box in WinNT 4? Is there any command that can give me
> the info? I am
> trying to input a string say "fráį" at the prompt, wondering how the
> characters are encoded.
>
> How about at the Unix (Solaris 2.6) prompt, what's the
> default and how to
> change?
>
> Thanks.
>
> Will
>
>
>
>

Previous message: Darren Morby: "Re: Letters d L l and t with caron"
Maybe in reply to: Tay, William: "Character encoding at the prompt"
Next in thread: David Starner: "Re: Character encoding at the prompt"
Next in thread: Yves Arrouye: "RE: Character encoding at the prompt"
Reply: David Starner: "Re: Character encoding at the prompt"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Thu Oct 25 2001 - 13:45:00 EDT