Re: problem - non-ASCII characters on Windows command line

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Jan 29 2004 - 11:50:46 EST

  • Next message: Philippe Verdy: "Re: problem - non-ASCII characters on Windows command line"

    Hi Deepak, I recommend to keep this thread on the unicode list for a better chance of getting the
    right answer.

    As I said in my earlier email, I would try the Windows command line window (DOS prompt window) and
    set it to Unicode mode via "chcp 10000".

    I just tried this on Windows 2000, and pasting Unicode characters (that are not in the OEM codepage)
    from the character map does not work. It appears to perform a conversion from Unicode to the OEM
    codepage (and then back out).

    My other machine has Windows XP. There, the same experiment works - I can paste non-Latin-1 accented
    Latin characters, Greek, the Euro symbol, etc.

    I have not tried this on either machine with a non-English keyboard or IME.
    I do not have other shells available on my Windows machines.
    Microsoft people (and users) on the list should be able to give more tips.

    Best regards,
    markus

    Deepak Chand Rathore wrote:

    > hi markus,
    > do u know any shell through which we can enter 16 bit file names in windows
    > as in Windows 2000, both FAT and NTFS use the Unicode character set for
    > their names , but i am able to enter to enter
    > 16 bit characters only through GUI.
    > does such shell exist or not ?
    >
    > Thanks for ur ideas.
    >
    > regards,
    > deepak
    >
    > -----Original Message-----
    > From: Markus Scherer [mailto:markus.scherer@jtcsv.com]
    > Sent: Donnerstag, 22. Januar 2004 22:41
    > To: unicode
    > Subject: Re: problem - non-ASCII characters on Windows command line
    >
    >
    > Your code looks like a Windows program.
    >
    > I recommend to use the WCHAR* version of main() itself - wmain() or _wmain()
    > or similar. It's been a
    > while since I did this... see MSDN for details.
    > In other words, don't just use a char* version of main() and then try to
    > convert to Unicode, but use
    > the Unicode version of main() directly. You will then get WCHAR *argv[]
    > right away.
    >
    > Also, try to not output to another non-Unicode codepage. In your case, you
    > get input in the system
    > "ANSI" codepage (which is the Windows non-Unicode codepage for legacy
    > applications), and since you
    > output to the console, your output is converted to the "OEM" codepage.
    >
    > At a minimum, try setting your console to Unicode (UTF-16LE) via "chcp
    > 10000". Alternatively, try
    > setting it to your "ANSI" codepage via "chcp 1252" or whatever is
    > appropriate.
    >
    > It would be better if you did not have to convert out to a non-Unicode
    > codepage at all. For example,
    > if the output is consumed by Notepad or another application (via a pipe or
    > output redirect etc.),
    > you could just output in UTF-8 (codepage 65001 on Windows, I believe) or
    > UTF-16LE (byte-serialize
    > your WCHAR*). I recommend to prepend U+FEFF to your output stream because
    > many Windows applications
    > recognize it as the Unicode signature.
    >
    > Best regards,
    > markus



    This archive was generated by hypermail 2.1.5 : Thu Jan 29 2004 - 12:42:47 EST