RE: Character encoding at the prompt

From: Addison Phillips [wM] (aphillips@webmethods.com)
Date: Wed Oct 24 2001 - 18:18:07 EDT


Hi William,

The answer is that it depends on the current user locale.

Generally, Western European languages in Windows use Code Page 1252 for GUI
displays and either Code Page 437 (US English) or Code Page 850 for "dos
boxes" (the "cmd" prompt). On Windows NT this can be changed manually with
the "chcp" command. Changing your actual system locale ("Regional Options")
will also change the windows and command line code pages as appropriate.
Fair warning: do NOT experiment with Asian locales on European builds of NT
4.0 systems (that you care about). In "Microsoft-ese", the Windows code
page is the ANSI code page and the command line is the OEM code page. In
this case, ANSI has nothing to do with the standards organization or any
particular encoding---it's just a name to differentiate the code page from
the OEM flavor. There is documentation on the MS website that I am too
pressed for time to lookup the URL for.....

On most UNIX-like operating systems, the current locale controls the
encoding. In fact, the encoding is part of the locale name. Generally
Western European languages use ISO-8859-1 (aka Latin-1). Solaris 2.7 and
especially 2.8 add support for nifty new encodings (including UTF-8, a
Unicode encoding). If you type "locale" at the shell prompt, you will see a
listing of your various locale settings, which will include the current
encoding. Unlike Windows, the locale (and thus encoding) apply to both
command line and GUI interfaces. Also unlike Windows, the locale setting is
process specific. Child processes inherit the parent's environment, so if
you change your locale and then launch a GUI program, that program will have
a matching locale. Of course, this is a generalization.....

Don't forget that file systems and shells have a part to play in your
command line excursions.

Hope this helps.

Addison

Addison P. Phillips
Globalization Architect / Manager, Globalization Engineering
webMethods, Inc. 432 Lakeside Drive, Sunnyvale, CA
+1 408.962.5487 (phone) +1 408.210.3569 (mobile)
-------------------------------------------------
Internationalization is an architecture. It is not a feature.

-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
Behalf Of Tay, William
Sent: Wednesday, October 24, 2001 5:08 PM
To: unicode@unicode.org
Subject: Character encoding at the prompt

Hi,

Do you have any idea what is the default code page and encoding scheme for
MS DOS box in WinNT 4? Is there any command that can give me the info? I am
trying to input a string say "fráç" at the prompt, wondering how the
characters are encoded.

How about at the Unix (Solaris 2.6) prompt, what's the default and how to
change?

Thanks.

Will



This archive was generated by hypermail 2.1.2 : Wed Oct 24 2001 - 19:26:04 EDT