Re: problem - non-ASCII characters on Windows command line

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Jan 22 2004 - 12:11:19 EST

Next message: jcowan@reutershealth.com: "Re: Unicode forms for internal storage - BOCU-1 speed"

Previous message: Markus Scherer: "Re: Unicode forms for internal storage - BOCU-1 speed"
In reply to: Deepak Chand Rathore: "problem"
Next in thread: Markus Scherer: "Re: problem - non-ASCII characters on Windows command line"
Maybe reply: Markus Scherer: "Re: problem - non-ASCII characters on Windows command line"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Your code looks like a Windows program.

I recommend to use the WCHAR* version of main() itself - wmain() or _wmain() or similar. It's been a
while since I did this... see MSDN for details.
In other words, don't just use a char* version of main() and then try to convert to Unicode, but use
the Unicode version of main() directly. You will then get WCHAR *argv[] right away.

Also, try to not output to another non-Unicode codepage. In your case, you get input in the system
"ANSI" codepage (which is the Windows non-Unicode codepage for legacy applications), and since you
output to the console, your output is converted to the "OEM" codepage.

At a minimum, try setting your console to Unicode (UTF-16LE) via "chcp 10000". Alternatively, try
setting it to your "ANSI" codepage via "chcp 1252" or whatever is appropriate.

It would be better if you did not have to convert out to a non-Unicode codepage at all. For example,
if the output is consumed by Notepad or another application (via a pipe or output redirect etc.),
you could just output in UTF-8 (codepage 65001 on Windows, I believe) or UTF-16LE (byte-serialize
your WCHAR*). I recommend to prepend U+FEFF to your output stream because many Windows applications
recognize it as the Unicode signature.

Best regards,
markus

-- 
Opinions expressed here may not reflect my company's positions unless otherwise noted.

Next message: jcowan@reutershealth.com: "Re: Unicode forms for internal storage - BOCU-1 speed"
Previous message: Markus Scherer: "Re: Unicode forms for internal storage - BOCU-1 speed"
In reply to: Deepak Chand Rathore: "problem"
Next in thread: Markus Scherer: "Re: problem - non-ASCII characters on Windows command line"
Maybe reply: Markus Scherer: "Re: problem - non-ASCII characters on Windows command line"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 22 2004 - 13:57:14 EST