RE: NT & UTF8

From: Addison Phillips (AddisonP@simultrans.com)
Date: Sun Oct 03 1999 - 21:52:04 EDT


I think we're in agreement. One thing I should note for others: when you
write a Win32 console program, you can (and) emphatically should) use
resources to store your strings... and call CharToOem() to convert your
strings to the active code page for your window (which is almost certainly
different than the resource file code page for Western European languages).
If you're writing for NT, then you should certainly consider using a Unicode
solution, given that platform's excellent in-built support for the UCS.

Addison
        __________________________________________

        Addison Phillips
        Director, Globalization Engineering
        SimulTrans, L.L.C.
        2606 Bayshore Parkway
        Mountain View, California 94043 USA

        +1 650-526-4652 (direct telephone)
        +1 650-969-9959 (facsimile)
        AddisonP@simultrans.com (Internet email)
        http://www.simultrans.com (website)

        "22 languages. One release date."
        __________________________________________

-----Original Message-----
From: Frank da Cruz [mailto:fdc@watsun.cc.columbia.edu]
Sent: Sunday, October 03, 1999 11:10 AM
To: AddisonP@simultrans.com
Cc: unicode@unicode.org
Subject: RE: NT & UTF8

Addison wrote:

> Frank wrote:
>
> >I suppose it makes sense that the situation would be better in NT. Win9x
> >also has the CHCP command, but you have to reboot after using it, and you
> >are also cautioned against using it at all, for fear of "disk
corruption".
> >Furthermore, US versions of Windows supply only two code pages for you to
> >change between: CP437 and CP850.
>
> The situation in Win9x is absolutely awful.
>
Yes, I once wasted a day trying load CP866 onto the US version of Win95.
In the end, I had to reformat my disk and reinstall Windows.

> At least with Win3.1 you could
> install any (Western) code page and change with impunity.
>
And DOS too, even on the fly. Our DOS terminal emulator let you switch
between Latin-1, Latin-2, Hebrew, and Cyrillic at will. Things were too
easy in those days :-)

> >> chcp 10000 changes to Unicode (well, UCS-2) and you can display real
> >> Unicode text in your "DOS" shell.
> >>
> >This is interesting. What happens, then, when you try to run a
> >non-UCS2-aware application in the console window? That is, something
> >that prints only ASCII strings (without NULs between each character)? Is
> >it total garbage, or does it display correctly by some magic?
>
> You get what you'd expect: garbage on the display. This is *not* wrong: I
> expect the display to obey it's code page setting in all instances.
>
Of course. I expected the answer to be "total garbage" and in fact would
have found any other answer to be more than a little scary.

> Having UTF-8 support would be nice in this regard...
>
I think each platform has to pick one and only encoding for Unicode to be
used natively, otherwise we get into trouble guessing at (or assuming) the
encoding.

> ... but you have to realize that ASCII doesn't even fully support most
> programs in *English*, let alone other languages. UTF-8's ASCII
> compatibility is best saved for file systems and parsers, not as a
> surrogate for internationalization.
>
And also as an on-the-wire encoding for communications protocols.

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT