Re: problem - non-ASCII characters on Windows command line

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Jan 29 2004 - 13:40:58 EST

  • Next message: Mark Davis: "Re: Collation charts out of date"

    ----- Original Message -----
    From: "Markus Scherer" <markus.scherer@jtcsv.com>
    To: "Deepak Chand Rathore" <deepakr@aztec.soft.net>; "unicode"
    <unicode@unicode.org>
    Sent: Thursday, January 29, 2004 5:50 PM
    Subject: Re: problem - non-ASCII characters on Windows command line

    > Hi Deepak, I recommend to keep this thread on the unicode list for a
    better chance of getting the
    > right answer.
    >
    > As I said in my earlier email, I would try the Windows command line window
    (DOS prompt window) and
    > set it to Unicode mode via "chcp 10000".
    >
    > I just tried this on Windows 2000, and pasting Unicode characters (that
    are not in the OEM codepage)
    > from the character map does not work. It appears to perform a conversion
    from Unicode to the OEM
    > codepage (and then back out).

    CHCP on the windows command prompt does only change the OUTPUT codepage,
    i.e. the way characters WRITTEN to the console are interpreted, possibly
    converted on Windows 9x/ME, and stored by the console itself in its display
    buffer.

    It does not change the INPUT codepage. So when you paste characters, the
    characters are sent to the console as if they were input from the keyboard,
    because the code that takes characters from the clipboard and send it to the
    console is serializing them through a DOS/BIOS compatible 8-bit input
    buffer, from which the shell or program reading input from the console will
    read through the BIOS/DOS emulation interrupts.

    So despite you can use:
        C:\> MODE CON /STATUS
        Status of peripheral CON:
        -----------------------
            Lines: 300
            Columns: 80
            Keyboard Speed: 31
            Keyboard Delay: 1
            Code Page: 10000
    and see that the console now uses the Unicode codepage, the command-line
    application or shell will not detect the change of codepage when
    interpreting bytes coming from the DOS/BIOS emulation interrupts, and will
    continue to interpret it with the input codepage set by the current keyboard
    driver selection.
    On the opposite, an application that outputs character to the console will
    behave correctly in the new codepage, because the current DOS keyboard
    driver selection is not involved.

    So CHCP does not seem to change the codepage used in the DOS emulation
    keyboard driver, which apparently continues to use the codepage associated
    with the currently selected keyboard driver in the regional settings (or in
    the language bar on XP).

    On the opposite, you may create a keyboard driver for a language mapped to
    the Unicode codepage, and select it in the language bar or in the user's
    regional setting. It will fix the problem for both input and output. I don't
    know how you can indicate to the DOS emulation/console keyboard driver to
    put characters pasted to its output queue so that they will be interpreted
    as being in another codepage. It seems that characters are queued in the
    console input buffer after a required conversion to the current keyboard
    codepage.



    This archive was generated by hypermail 2.1.5 : Thu Jan 29 2004 - 14:44:35 EST