RE: NT & UTF8

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Sun Oct 03 1999 - 14:09:35 EDT


Addison wrote:

> Frank wrote:
>
> >I suppose it makes sense that the situation would be better in NT. Win9x
> >also has the CHCP command, but you have to reboot after using it, and you
> >are also cautioned against using it at all, for fear of "disk corruption".
> >Furthermore, US versions of Windows supply only two code pages for you to
> >change between: CP437 and CP850.
>
> The situation in Win9x is absolutely awful.
>
Yes, I once wasted a day trying load CP866 onto the US version of Win95.
In the end, I had to reformat my disk and reinstall Windows.

> At least with Win3.1 you could
> install any (Western) code page and change with impunity.
>
And DOS too, even on the fly. Our DOS terminal emulator let you switch
between Latin-1, Latin-2, Hebrew, and Cyrillic at will. Things were too
easy in those days :-)

> >> chcp 10000 changes to Unicode (well, UCS-2) and you can display real
> >> Unicode text in your "DOS" shell.
> >>
> >This is interesting. What happens, then, when you try to run a
> >non-UCS2-aware application in the console window? That is, something
> >that prints only ASCII strings (without NULs between each character)? Is
> >it total garbage, or does it display correctly by some magic?
>
> You get what you'd expect: garbage on the display. This is *not* wrong: I
> expect the display to obey it's code page setting in all instances.
>
Of course. I expected the answer to be "total garbage" and in fact would
have found any other answer to be more than a little scary.

> Having UTF-8 support would be nice in this regard...
>
I think each platform has to pick one and only encoding for Unicode to be
used natively, otherwise we get into trouble guessing at (or assuming) the
encoding.

> ... but you have to realize that ASCII doesn't even fully support most
> programs in *English*, let alone other languages. UTF-8's ASCII
> compatibility is best saved for file systems and parsers, not as a
> surrogate for internationalization.
>
And also as an on-the-wire encoding for communications protocols.

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT