RE: Unicode Imcompatibilities on Windows 95/NT

From: Lori Brownell (loribr@microsoft.com)
Date: Tue Jan 06 1998 - 00:54:25 EST


I believe you are referring to ShiftJIS.txt in the Unicode v2 CD
(\DOS\Mappings\EastAsia\JIS). This file was prepared by Glenn Adams
<glenn@metis.com> and Jon H. Jenkins <John_Jenkins@taligent.com>. It is not
a Microsoft official conversion for ShiftJIS, and is not representative of
the actual conversion done on Windows 95 or Windows NT when using the
Unicode conversion APIs. The Microsoft official encoding is described in
CP932.txt (\DOS\Mappings\Vendors\MICSFT\Windows).

The following are differences between the two files, although this list only
contains mappings that exist in both tables, but are different. Mappings
that are in one, but not the other are not listed. The first column is the
ShiftJIS code value, the 2nd column (labeled cp932) is the Unicode
equivalent in the official Microsoft 932 mapping file. The 3rd column
(labeled G.Adams) is the Unicode equivalent in the ShiftJIS.txt file in the
Unicode v2 CD. The final column contains the Unicode names for the 2nd
column, followed by the 3rd column values.

        XJIS cp932 G.Adams
        0x5C U+005C U+00A5 Reverse Solidus - Yen Sign
        0x7E U+007E U+203E Tilde - Overline
        0x815F U+FF3C U+00A5 Fullwidth Reverse Solidus - Yen Sign
        0x8160 U+FF5E U+301C Fullwidth Tilde - Wave Dash
        0x8161 U+2225 U+2016 Parallel To - Double Vertical
        0x817C U+FF0D U+2212 Fullwidth Hyphen-Minus - Minus Sign
        0x8191 U+FFE0 U+00A2 Fullwidth Cent Sign - Cent Sign
        0x8192 U+FFE1 U+00A3 Fullwidth Pound Sign - Pound Sign
        0x81CA U+FFE2 U+00AC Fullwidth Not Sign - Not Sign

Those that you mention in your earlier mail, U+301c and U+2212, are not in
the CP932.txt file at all, therefore those 2 Unicode characters would not
map into code page 932 on either Windows 95 or Windows NT, and would end up
mapping to the default character. Please use the official Microsoft
CP932.txt for the Microsoft ShiftJIS mapping description, or an even better
method would be to use the MultiByteToWideChar and WideCharToMultiByte APIs.

Thanks,
Lori Brownell
Windows NT Program Manager
Microsoft Corporation

> -----Original Message-----
> From: kazama@ingrid.org [SMTP:kazama@ingrid.org]
> Sent: Monday, January 05, 1998 9:55 AM
> To: Multiple Recipients of
> Subject: Unicode Imcompatibilities on Windows 95/NT
>
> Recently, many japanese programmers reported imcompatibilities of
> Unicode used by Microsoft Windows 95/NT.
>
> As a result of my tests, I found that microsoft uses his own encoding
> conversion scheme.
>
> For example, "WAVE DASH" of JIS X 0208 is converted to "WAVE DASH"
> (U+301C) of Unicode ordinarily (Ex. JIS X 0221 = ISO/IEC 10646). But
> Windows 95/NT converts it to "FULLWIDTH TILDE" (U+FF5E).
>
> And "MINUS SIGN" of JIS X 0208 is converted to "MINUS SIGN" (U+2212)
> ordinarily. But Windows 95/NT converts it to "FULLWIDTH HYPHEN-MINUS"
> (U+FF0D).
>
> These differences of encoding conversion produce imcompatibilities
> between different unicode-based systems (Ex. Windows and Java).
>
> Microsoft may want to use "Halfwidth and Fullwidth Forms" area. But
> Windows 95/NT are rich text system and they can design appropriate
> glyph size and width fonts easily.
>
> Why microsoft uses non-standard encoding conversions although it
> produces imcompatibilities? Are these bugs?
>
> Kazuhiro Kazama (kazama@ingrid.org) Ingrid Project



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT