RE: wide chars and methods

From: Vidya Maheshwar Nabar (vidyan@aztec.soft.net)
Date: Thu Nov 10 2005 - 06:22:25 CST

Next message: Raymond Mercier: "set unicode vacation"

Previous message: Cristian Secarã: "Re: three questions about alphabet files at Michael Everson site"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi Shawn/Bob,

Sorry for such a late reply; was on vacation.

The system on which I run my code has a locale 'Japanese_Japan.932'

I modified my original program to this and built it with _UNICODE:
// Listing 1
wchar_t wstr[MAX];
wcin >> wstr;
wcout << wstr << endl;

I ran it by pasting the same U+307E,U+305B and U+3093 input. The output
displays the Japanese string only the first time the program is executed.
Subsequent execution doesn't display the output till a system restart.

So, modified the program further to the following (retaining _UNICODE):
// Listing 2
wchar_t wstr[MAX];
char str[MAX];
cin >> str; // accept input in char
MultiByteToWideChar( CP_ACP, 0, str, strlen(str), wstr, MAX );
// work with wstr in routines that need wchar_t...
cout << str; // display output as char

This works, like it should, since we're using char and ANSI methods.

- Does this mean that we can't use a wchar_t-wcin-wcout combination even
when compiled with _UNICODE to accept/display wide chars through console?
(Incidentally, I tried char-cin-cout with _UNICODE and it does work fine).
Am I still missing something in the above programs?
- Since the console is ANSI, it would mean that there's no way we can 'type
in'/paste Japanese characters. (Though the console apparently accepts
through the latter method). So how does one enter multibyte characters?

Regards,
Vidya.

-----Original Message-----
From: Bob Eaton [mailto:pete_dembrowski@hotmail.com]
Sent: Saturday, October 29, 2005 12:51 PM
To: Vidya Maheshwar Nabar; Shawn Steele; Dominikus Scherkl
Cc: unicode@unicode.org
Subject: Re: wide chars and methods

Vidya,

The reason 'char' works is because you're probably not building the app with
the _UNICODE compiler define and so yours is really an "Ansi" app. Ansi apps
work on NT-based OSs for those ranges of Unicode that have code page support
(c.f. the other thread on the Unicode list about "ANSI and Unicode for x00 -
xFF").

So Japanese will work because NT provides code page 932 (I think) to turn
wide characters into narrow (char) characters and vise versa automatically.
As Murray and Michael pointed out in that other thread, however, this won't
work for Devanagari because Devanagari doesn't have "Ansi" code page
support.

If you build your app with _UNICODE, then 'char' won't work because each
character will be (nominally) two bytes.

The reason you have to do setlocale is so you can tell the system what code
page to use to convert those single byte (or with Japanese, double-byte)
characters into wide/Unicode and vise versa (aside: this is probably only
true if the code page you set in setlocale is different from the default
system code page--i.e. if the default system code page is already set to
932, then you probably don't have to do setlocale).

Keyboard input and the file i/o are completely different things. The
keyboard will probably work if you use setlocale so the system will know
what code page to send you narrow Ansi characters (having converted them
from wide/Unicode). The file i/o will be dependent on whether you're using
the _UNICODE switch or not. If it's not set, then you'll read/write single
(or double) "Ansi" byte(s) for each character. If set, then you'll
read/write UTF-16 words (nominally, 2 bytes) for each character.

If you use TCHAR, then it works in both Ansi (aka. _MBCS) and _UNICODE mode.
If you use wchar_t, it only make sense if you then have _UNICODE set.

Suggestion: I avoid wchar_t myself, because if you end up linking with other
libraries, they have to be using wchar_t also or the link fails.

Bob

P.S. If you define the compiler switch _UNICODE, be sure to remove the _MBCS
switch with which it is mutually exclusive

P.S.S. Not that I'm an expert and I'm sure that others could do a better
job, but I've agreed to do a "webchat" on the topic of encoding conversion
and porting VC++ 6.0 programs to support Unicode. If you're interested,
here's the link: http://bhashaindia.com/events/chat/
<http://bhashaindia.com/events/chat/> .
----- Original Message -----
From: Vidya <mailto:vidyan@aztec.soft.net> Maheshwar Nabar
To: Shawn <mailto:Shawn.Steele@microsoft.com> Steele ; Dominikus
<mailto:dscherkl@xpaneon.com> Scherkl
Cc: unicode@unicode.org <mailto:unicode@unicode.org>
Sent: Friday, October 28, 2005 2:41 PM
Subject: RE: wide chars and methods

Hi Shawn/Dominikus,

Thanks for the response.

I don't want to use TCHAR around the code for some reason. I thought I
should be using 'wchar_t' for holding Japanese text, but 'char ' seemed to
work fine. As for wchar_t, I could see Japanese strings only after I made a
setlocale( LC_ALL, "" ) call, (though I still cannot 'type in' Japanese;
when the focus is in the console, the Japanese IME which is otherwise
displayed, disappears), hence the question about wide char/method usability.
- Why do we need setlocale when using wchar_t to display Japanese strings?
Even with this workaround, I'm still unable to enter Japanese strings via
console.
- This distinction doesn't seem to extend to file streams. So, does console
i/o differ from disk i/o? I tried reading from/writing to files with
Japanese strings using both char and wchar_t (sans setlocale) without any
issues.

Regards,
Vidya.

-----Original Message-----
From: Shawn Steele [mailto:Shawn.Steele@microsoft.com]
Sent: Tuesday, October 25, 2005 11:37 PM
To: Vidya Maheshwar Nabar; unicode@unicode.org
Subject: RE: wide chars and methods

Windows 2000 Server is natively Unicode. You cannot "hold all the
characters" of that OS in an 8 bit char.

The windows console is restricted to "ansi" code pages, which is probably
why you're seeing the behavior you're seeing. Its strongly recommended that
you avoid using ANSI applications and use Unicode instead.

- Shawn

SDE, Microsoft

_____

From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
Behalf Of Vidya Maheshwar Nabar
Sent: Tuesday, October 25, 2005 2:55 AM
To: unicode@unicode.org
Subject: wide chars and methods

Hi,

I wanted to know why 'wchar_t' data type is required if a 'char' can very
well hold all the characters on a given OS. To elaborate, I run the program
below on a Japanese Win 2K Server and pass Japanese strings:

Code Snippet(VC++ 6.0):
char str[MAX];
cin >> str;
cout << str << endl;

Input:
'Ü'¹'ñ

Output:
'Ü'¹'ñ

Note: here, input is U+307E,U+305B and U+3093.

The above program runs fine with chars and cin/cout/scanf/printf, in fact
things go weird when I use wchars and wcin/wcout/wscanf/wprintf, it just
doesn't output anything!

How/Why is cin/cout/scanf/printf able to process Japanese strings on a
Japanese machine with a char, and not wcin/wcout/wscanf/wprintf with wchar?
Isn't that wchars/wide methods are needed for chars beyond the 8-bit range
as char can't handle it? Am I missing something here?

Thanks in advance.

Regards,
Vidya.

**********************************************************

The information contained in, or attached to, this e-mail, contains confidential information and is intended solely for the use of the individual or entity to whom they are addressed and is subject to legal privilege. If you have received this e-mail in error you should notify the sender immediately by reply e-mail, delete the message from your system and notify your system manager. Please do not copy it for any purpose, or disclose its contents to any other person. The views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of the company. The recipient should check this e-mail and any attachments for the presence of viruses. The company accepts no liability for any damage caused, directly or indirectly, by any virus transmitted in this email

************************************************************

Next message: Raymond Mercier: "set unicode vacation"
Previous message: Cristian Secarã: "Re: three questions about alphabet files at Michael Everson site"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Nov 10 2005 - 11:44:02 CST