Re: problem - non-ASCII characters on Windows command line

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Jan 29 2004 - 13:40:58 EST

Next message: Mark Davis: "Re: Collation charts out of date"

Previous message: Markus Scherer: "Re: problem - non-ASCII characters on Windows command line"
In reply to: Markus Scherer: "Re: problem - non-ASCII characters on Windows command line"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

----- Original Message -----
From: "Markus Scherer" <markus.scherer@jtcsv.com>
To: "Deepak Chand Rathore" <deepakr@aztec.soft.net>; "unicode"
<unicode@unicode.org>
Sent: Thursday, January 29, 2004 5:50 PM
Subject: Re: problem - non-ASCII characters on Windows command line

> Hi Deepak, I recommend to keep this thread on the unicode list for a
better chance of getting the
> right answer.
>
> As I said in my earlier email, I would try the Windows command line window
(DOS prompt window) and
> set it to Unicode mode via "chcp 10000".
>
> I just tried this on Windows 2000, and pasting Unicode characters (that
are not in the OEM codepage)
> from the character map does not work. It appears to perform a conversion
from Unicode to the OEM
> codepage (and then back out).

CHCP on the windows command prompt does only change the OUTPUT codepage,
i.e. the way characters WRITTEN to the console are interpreted, possibly
converted on Windows 9x/ME, and stored by the console itself in its display
buffer.

It does not change the INPUT codepage. So when you paste characters, the
characters are sent to the console as if they were input from the keyboard,
because the code that takes characters from the clipboard and send it to the
console is serializing them through a DOS/BIOS compatible 8-bit input
buffer, from which the shell or program reading input from the console will
read through the BIOS/DOS emulation interrupts.

So despite you can use:
    C:\> MODE CON /STATUS
    Status of peripheral CON:
    -----------------------
        Lines: 300
        Columns: 80
        Keyboard Speed: 31
        Keyboard Delay: 1
        Code Page: 10000
and see that the console now uses the Unicode codepage, the command-line
application or shell will not detect the change of codepage when
interpreting bytes coming from the DOS/BIOS emulation interrupts, and will
continue to interpret it with the input codepage set by the current keyboard
driver selection.
On the opposite, an application that outputs character to the console will
behave correctly in the new codepage, because the current DOS keyboard
driver selection is not involved.

So CHCP does not seem to change the codepage used in the DOS emulation
keyboard driver, which apparently continues to use the codepage associated
with the currently selected keyboard driver in the regional settings (or in
the language bar on XP).

On the opposite, you may create a keyboard driver for a language mapped to
the Unicode codepage, and select it in the language bar or in the user's
regional setting. It will fix the problem for both input and output. I don't
know how you can indicate to the DOS emulation/console keyboard driver to
put characters pasted to its output queue so that they will be interpreted
as being in another codepage. It seems that characters are queued in the
console input buffer after a required conversion to the current keyboard
codepage.

Next message: Mark Davis: "Re: Collation charts out of date"
Previous message: Markus Scherer: "Re: problem - non-ASCII characters on Windows command line"
In reply to: Markus Scherer: "Re: problem - non-ASCII characters on Windows command line"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 29 2004 - 14:44:35 EST