RE: Unicode Imcompatibilities on Windows 95/NT

From: Kazuhiro Kazama (kazama@ingrid.org)
Date: Tue Jan 06 1998 - 02:28:44 EST


From: Lori Brownell <loribr@MICROSOFT.com>
Subject: RE: Unicode Imcompatibilities on Windows 95/NT
Date: Mon, 5 Jan 1998 21:54:25 -0800
> I believe you are referring to ShiftJIS.txt in the Unicode v2 CD
> (\DOS\Mappings\EastAsia\JIS). This file was prepared by Glenn Adams

JIS (Japan Industrian Standard) has already define conversions between
JIS X 0201, 0208, 0212 and JIS X0221 (= ISO/IEC 10646 = Unicode 2.0).

> Those that you mention in your earlier mail, U+301c and U+2212, are not in
> the CP932.txt file at all, therefore those 2 Unicode characters would not
> map into code page 932 on either Windows 95 or Windows NT, and would end up
> mapping to the default character. Please use the official Microsoft
> CP932.txt for the Microsoft ShiftJIS mapping description, or an even better
> method would be to use the MultiByteToWideChar and WideCharToMultiByte APIs.

There is no serious problem in single unicode-based system. But there
are many problems in communication with different unicode-based
systems.

In the case of using a Java program on Windows 95/NT, it communicate
with Win32 native method in Unicode.

When user input a "WAVE DASH" of JIS X0208, it is converted to
"FULLWIDTH TILDE" of Unicode by Win32 conversion APIs and passed to a
Java program.

When user read a same "WAVE DASH" of JIS X0208, it is converted to
"WAVE DASH" of Unicode by sun.io converter.

This fact shows that programmer can't compare japanese text
properly. And program can't display and output characters correctly
under some limited situations.

These imcompatibilities aren't only Java-specific issues.

Some W3C standard will uses UTF-8 for communication and data storing
form. there will be same problems under these situations. For example,
our japanese must distinguish between microsoft-specific HTML (or XML)
files and others.

I think these problems aren't collation issues because these are
vender-specific and parts besides collator (Ex. font encoding,
converters etc.) must be modified by all vendors.

Who know good solutions or good walkaround?

Thanks.

Kazuhiro Kazama (kazama@ingrid.org) Ingrid Project



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT