Re: Shift-JIS/Unicode mapping in JAVA

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed May 28 2003 - 21:35:11 EDT

  • Next message: Tom Gewecke: "Re: Not snazzy (was: New Unicode Savvy Logo)"

    Most probably, Sun upgraded its tables from ICU, and ICU had this bug, which did not exist in their prior tables for MS-CP932. So the source of the data may now be different, or there may be an alias problem in the MS-CP932 encoding name.
    Submit this bug to Sun, (and probably also to IBM's ICU), so that it can be corrected...

    This is really a regression, unless Microsoft has changed its MS-CP932 to better support the new JIS standard based on the unifciation of the Han script in Windows XP, .Net, and Windows 2003...

    In that case, Microsoft has corrected its codepage without registering a new codepage (and the fault is on Microsoft).

    -- Philippe.
    ----- Original Message -----
    From: "Jane Liu" <xjliu_ca@yahoo.com>
    To: <unicode@unicode.org>
    Sent: Wednesday, May 28, 2003 9:36 PM
    Subject: Shift-JIS/Unicode mapping in JAVA

    > Hi,
    >
    > I am running a JAVA program on Japanese Windows 2000 system, looking
    > at the Unicode conversion of the following four characters from
    > Shift-JIS encoding (MS-CP932) in both JRE 1.3.1 and JRE 1.4.1, and
    > noticed some interesting changes:
    >
    > In JRE 1.3.1, it converts them just same as what Microsoft does:
    >
    > 0x815C (&#8213;) -> U+2015 (&#8213;) Horizontal Bar
    > 0x8160 (&#65374;) -> U+FF5E (&#65374;) Full-width Tilde
    > 0x8161 (&#8741;) -> U+2225 (&#8741;) Parallel To
    > 0x817C (&#65293;) -> U+FF0D (&#65293;) Full-width Hyphen
    >
    > In JRE 1.4.1, it converts them just same as what ICU does:
    >
    > 0x815C (&#8213;) -> U+2014 (-) EM Dash
    > 0x8160 (&#65374;) -> U+301C (&#12316;) Wave Dash
    > 0x8161 (&#8741;) -> U+2016 (&#8214;) Double Vertical Line
    > 0x817C (&#65293;) -> U+2212 (&#8722;) Minus Sign
    >
    > Obviously, this cause some backward compatibility & forward migration
    > issues here. I have the exactly same program. Those four Japanese
    > characters used to work perfectly when we use the older JRE version
    > 1.3.1. However, now we move up to JRE 1.4.1, three of the four
    > charcters are displayed differently, and one which is the "Double
    > vertical line" becomes a dot on the UI because U+2016 is not defined
    > in Japanese TrueType font "MS Gothic".
    >
    > Can someone help please? Why SUN made such changes? To me, it's hard
    > to believe this is just a mistake in the new mapping table, if SUN
    > does have some good reasons, they may also require some code changes,
    > would this be true? Where and how should I change my code?
    >
    > Thanks.
    >
    > Jane
    >
    > __________________________________
    > Do you Yahoo!?
    > Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
    > http://calendar.yahoo.com
    >



    This archive was generated by hypermail 2.1.5 : Wed May 28 2003 - 22:19:12 EDT