Shift-JIS/Unicode mapping in JAVA

From: Jane Liu (
Date: Wed May 28 2003 - 15:36:39 EDT

  • Next message: Karl Pentzlin: "When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?"


    I am running a JAVA program on Japanese Windows 2000 system, looking
    at the Unicode conversion of the following four characters from
    Shift-JIS encoding (MS-CP932) in both JRE 1.3.1 and JRE 1.4.1, and
    noticed some interesting changes:

    In JRE 1.3.1, it converts them just same as what Microsoft does:

    0x815C (―) -> U+2015 (―) Horizontal Bar
    0x8160 (~) -> U+FF5E (~) Full-width Tilde
    0x8161 (∥) -> U+2225 (∥) Parallel To
    0x817C (-) -> U+FF0D (-) Full-width Hyphen

    In JRE 1.4.1, it converts them just same as what ICU does:

    0x815C (―) -> U+2014 () EM Dash
    0x8160 (~) -> U+301C (〜) Wave Dash
    0x8161 (∥) -> U+2016 (‖) Double Vertical Line
    0x817C (-) -> U+2212 (−) Minus Sign

    Obviously, this cause some backward compatibility & forward migration
    issues here. I have the exactly same program. Those four Japanese
    characters used to work perfectly when we use the older JRE version
    1.3.1. However, now we move up to JRE 1.4.1, three of the four
    charcters are displayed differently, and one which is the "Double
    vertical line" becomes a dot on the UI because U+2016 is not defined
    in Japanese TrueType font "MS Gothic".

    Can someone help please? Why SUN made such changes? To me, it's hard
    to believe this is just a mistake in the new mapping table, if SUN
    does have some good reasons, they may also require some code changes,
    would this be true? Where and how should I change my code?



    Do you Yahoo!?
    Yahoo! Calendar - Free online calendar with sync to Outlook(TM).

    This archive was generated by hypermail 2.1.5 : Wed May 28 2003 - 16:20:45 EDT