RE: wide chars and methods

From: Vidya Maheshwar Nabar (vidyan@aztec.soft.net)
Date: Thu Nov 10 2005 - 06:22:25 CST

  • Next message: Raymond Mercier: "set unicode vacation"

    Hi Shawn/Bob,
     
    Sorry for such a late reply; was on vacation.
     
    The system on which I run my code has a locale 'Japanese_Japan.932'
     
    I modified my original program to this and built it with _UNICODE:
    // Listing 1
    wchar_t wstr[MAX];
    wcin >> wstr;
    wcout << wstr << endl;
     
    I ran it by pasting the same U+307E,U+305B and U+3093 input. The output
    displays the Japanese string only the first time the program is executed.
    Subsequent execution doesn't display the output till a system restart.
     
    So, modified the program further to the following (retaining _UNICODE):
    // Listing 2
    wchar_t wstr[MAX];
    char str[MAX];
    cin >> str; // accept input in char
    MultiByteToWideChar( CP_ACP, 0, str, strlen(str), wstr, MAX );
    // work with wstr in routines that need wchar_t...
    cout << str; // display output as char
     
    This works, like it should, since we're using char and ANSI methods.
     
    - Does this mean that we can't use a wchar_t-wcin-wcout combination even
    when compiled with _UNICODE to accept/display wide chars through console?
    (Incidentally, I tried char-cin-cout with _UNICODE and it does work fine).
    Am I still missing something in the above programs?
    - Since the console is ANSI, it would mean that there's no way we can 'type
    in'/paste Japanese characters. (Though the console apparently accepts
    through the latter method). So how does one enter multibyte characters?
     
    Regards,
    Vidya.
     
    -----Original Message-----
    From: Bob Eaton [mailto:pete_dembrowski@hotmail.com]
    Sent: Saturday, October 29, 2005 12:51 PM
    To: Vidya Maheshwar Nabar; Shawn Steele; Dominikus Scherkl
    Cc: unicode@unicode.org
    Subject: Re: wide chars and methods
     
    Vidya,
     
    The reason 'char' works is because you're probably not building the app with
    the _UNICODE compiler define and so yours is really an "Ansi" app. Ansi apps
    work on NT-based OSs for those ranges of Unicode that have code page support
    (c.f. the other thread on the Unicode list about "ANSI and Unicode for x00 -
    xFF").
     
    So Japanese will work because NT provides code page 932 (I think) to turn
    wide characters into narrow (char) characters and vise versa automatically.
    As Murray and Michael pointed out in that other thread, however, this won't
    work for Devanagari because Devanagari doesn't have "Ansi" code page
    support.
     
    If you build your app with _UNICODE, then 'char' won't work because each
    character will be (nominally) two bytes.
     
    The reason you have to do setlocale is so you can tell the system what code
    page to use to convert those single byte (or with Japanese, double-byte)
    characters into wide/Unicode and vise versa (aside: this is probably only
    true if the code page you set in setlocale is different from the default
    system code page--i.e. if the default system code page is already set to
    932, then you probably don't have to do setlocale).
     
    Keyboard input and the file i/o are completely different things. The
    keyboard will probably work if you use setlocale so the system will know
    what code page to send you narrow Ansi characters (having converted them
    from wide/Unicode). The file i/o will be dependent on whether you're using
    the _UNICODE switch or not. If it's not set, then you'll read/write single
    (or double) "Ansi" byte(s) for each character. If set, then you'll
    read/write UTF-16 words (nominally, 2 bytes) for each character.
     
    If you use TCHAR, then it works in both Ansi (aka. _MBCS) and _UNICODE mode.
    If you use wchar_t, it only make sense if you then have _UNICODE set.
     
    Suggestion: I avoid wchar_t myself, because if you end up linking with other
    libraries, they have to be using wchar_t also or the link fails.
     
    Bob
     
    P.S. If you define the compiler switch _UNICODE, be sure to remove the _MBCS
    switch with which it is mutually exclusive
     
    P.S.S. Not that I'm an expert and I'm sure that others could do a better
    job, but I've agreed to do a "webchat" on the topic of encoding conversion
    and porting VC++ 6.0 programs to support Unicode. If you're interested,
    here's the link: http://bhashaindia.com/events/chat/
    <http://bhashaindia.com/events/chat/> .
    ----- Original Message -----
    From: Vidya <mailto:vidyan@aztec.soft.net> Maheshwar Nabar
    To: Shawn <mailto:Shawn.Steele@microsoft.com> Steele ; Dominikus
    <mailto:dscherkl@xpaneon.com> Scherkl
    Cc: unicode@unicode.org <mailto:unicode@unicode.org>
    Sent: Friday, October 28, 2005 2:41 PM
    Subject: RE: wide chars and methods
     
     
    Hi Shawn/Dominikus,
     
    Thanks for the response.
     
    I don't want to use TCHAR around the code for some reason. I thought I
    should be using 'wchar_t' for holding Japanese text, but 'char ' seemed to
    work fine. As for wchar_t, I could see Japanese strings only after I made a
    setlocale( LC_ALL, "" ) call, (though I still cannot 'type in' Japanese;
    when the focus is in the console, the Japanese IME which is otherwise
    displayed, disappears), hence the question about wide char/method usability.
    - Why do we need setlocale when using wchar_t to display Japanese strings?
    Even with this workaround, I'm still unable to enter Japanese strings via
    console.
    - This distinction doesn't seem to extend to file streams. So, does console
    i/o differ from disk i/o? I tried reading from/writing to files with
    Japanese strings using both char and wchar_t (sans setlocale) without any
    issues.
     
    Regards,
    Vidya.
     
     
     
    -----Original Message-----
    From: Shawn Steele [mailto:Shawn.Steele@microsoft.com]
    Sent: Tuesday, October 25, 2005 11:37 PM
    To: Vidya Maheshwar Nabar; unicode@unicode.org
    Subject: RE: wide chars and methods
     
    Windows 2000 Server is natively Unicode. You cannot "hold all the
    characters" of that OS in an 8 bit char.
     
    The windows console is restricted to "ansi" code pages, which is probably
    why you're seeing the behavior you're seeing. Its strongly recommended that
    you avoid using ANSI applications and use Unicode instead.
     
    - Shawn
     
    SDE, Microsoft
     

      _____

    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
    Behalf Of Vidya Maheshwar Nabar
    Sent: Tuesday, October 25, 2005 2:55 AM
    To: unicode@unicode.org
    Subject: wide chars and methods
     
     
    Hi,
     
    I wanted to know why 'wchar_t' data type is required if a 'char' can very
    well hold all the characters on a given OS. To elaborate, I run the program
    below on a Japanese Win 2K Server and pass Japanese strings:
     
    Code Snippet(VC++ 6.0):
    char str[MAX];
    cin >> str;
    cout << str << endl;
     
    Input:
    'Ü'¹'ñ
     
    Output:
    'Ü'¹'ñ
     
    Note: here, input is U+307E,U+305B and U+3093.
     
    The above program runs fine with chars and cin/cout/scanf/printf, in fact
    things go weird when I use wchars and wcin/wcout/wscanf/wprintf, it just
    doesn't output anything!
     
    How/Why is cin/cout/scanf/printf able to process Japanese strings on a
    Japanese machine with a char, and not wcin/wcout/wscanf/wprintf with wchar?
    Isn't that wchars/wide methods are needed for chars beyond the 8-bit range
    as char can't handle it? Am I missing something here?
     
    Thanks in advance.
     
    Regards,
    Vidya.
     

    **********************************************************

    The information contained in, or attached to, this e-mail, contains confidential information and is intended solely for the use of the individual or entity to whom they are addressed and is subject to legal privilege. If you have received this e-mail in error you should notify the sender immediately by reply e-mail, delete the message from your system and notify your system manager. Please do not copy it for any purpose, or disclose its contents to any other person. The views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of the company. The recipient should check this e-mail and any attachments for the presence of viruses. The company accepts no liability for any damage caused, directly or indirectly, by any virus transmitted in this email

    ************************************************************



    This archive was generated by hypermail 2.1.5 : Thu Nov 10 2005 - 11:44:02 CST