Re: problem - non-ASCII characters on Windows command line

From: Markus Scherer (
Date: Thu Jan 22 2004 - 12:11:19 EST

  • Next message: "Re: Unicode forms for internal storage - BOCU-1 speed"

    Your code looks like a Windows program.

    I recommend to use the WCHAR* version of main() itself - wmain() or _wmain() or similar. It's been a
    while since I did this... see MSDN for details.
    In other words, don't just use a char* version of main() and then try to convert to Unicode, but use
    the Unicode version of main() directly. You will then get WCHAR *argv[] right away.

    Also, try to not output to another non-Unicode codepage. In your case, you get input in the system
    "ANSI" codepage (which is the Windows non-Unicode codepage for legacy applications), and since you
    output to the console, your output is converted to the "OEM" codepage.

    At a minimum, try setting your console to Unicode (UTF-16LE) via "chcp 10000". Alternatively, try
    setting it to your "ANSI" codepage via "chcp 1252" or whatever is appropriate.

    It would be better if you did not have to convert out to a non-Unicode codepage at all. For example,
    if the output is consumed by Notepad or another application (via a pipe or output redirect etc.),
    you could just output in UTF-8 (codepage 65001 on Windows, I believe) or UTF-16LE (byte-serialize
    your WCHAR*). I recommend to prepend U+FEFF to your output stream because many Windows applications
    recognize it as the Unicode signature.

    Best regards,

    Opinions expressed here may not reflect my company's positions unless otherwise noted.

    This archive was generated by hypermail 2.1.5 : Thu Jan 22 2004 - 13:57:14 EST