Re: Unicode Imcompatibilities on Windows 95/NT

From: Kazuhiro Kazama (kazama@ingrid.org)
Date: Mon Jan 05 1998 - 23:02:23 EST


Thank you, Kenneth.

From: kenw@sybase.com (Kenneth Whistler)
Subject: Re: Unicode Imcompatibilities on Windows 95/NT
Date: Mon, 5 Jan 1998 11:32:24 -0800
> The table CP932.TXT (dated April 14, 1996, provided by Microsoft)
> shows the actual mapping of Microsoft Windows Code Page 932 to
> Unicode.

I misunderstood Microsoft's Cp 932 is upper-compatible with Japanese
Shift-JIS encoding.

> This comment ignores the fact that many of these JIS <==> DBCS
> Asian vendor code page mapping issues already existed as legacy
> issues for DOS and Windows-based code pages.

In fact, many vendor uses their original Shift-JIS variant. But they
only add characters in the area that JIS didn't define.

For Cp 932, they change the character mapping in the area that JIS
defined.

> Conversion between legacy character sets through Unicode must
> always be done with care for the particular problems of mismatches
> and/or non-one-to-one conversions required. This is especially

There is a difference between a YEN SIGN problem and a Cp 932 problem.
A YEN SIGN problem is a japanese-specific, but not
vendor-specific. But a Cp 932 problem is vendor-specific.

And a Cp 932 problem produces the imcompatibility of UTF-8
representation form. I think this is a serious problem because UTF-8
will be used in many web standards.

For example, many japanese store their japanese texts in Shift-JIS
encoding. If we uses only characters in JIS X 0201, 0208 and 0212,
there is no difference.

But UTF-8 converted files on Windows 95 is "not" equal to them on
Java.

And I think this isn't a collation issues. In fact, I will show a
sample java program. This sample program treats "WAVE DASH" and
"FULLWIDTH TILDE" as differenct characters in a japanese locale.

Would you think how our japanese treat this problem?

I think that the best solution is supporting JIS X 0221 standard
mapping in Cp 932. Are there good walkaround?

Kazuhiro Kazama (kazama@ingrid.org) Ingrid Project

----
import java.text.*;
import java.util.*;

class MSCollation { public static void main(String args[]) { Collator c = Collator.getInstance(Locale.JAPANESE); c.setStrength(Collator.PRIMARY);

if (c.equals("\u301c", "\uff5e")) System.out.println("\"WAVE DASH\" is equal to \"FULLWIDTH TILDE\"."); else System.out.println("\"WAVE DASH\" is not equal to \"FULLWIDTH TILDE\".");

if (c.equals("\u2212", "\uff0d")) System.out.println("\"MINUS SIGN\" is equal to \"FULLWIDTH HYPHEN-MINUS\"."); else System.out.println("\"MINUS SIGN\" is not equal to \"FULLWIDTH HYPHEN-MINUS\"."); } }



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT