RE: I yen for a backslash!

From: Murray Sargent (murrays@exchange.microsoft.com)
Date: Tue Jan 24 2006 - 15:52:00 CST

  • Next message: Murray Sargent: "RE: I yen for a backslash!"

    Ah the forever annoying Yen-sign/backslash ambiguity. In the early days of international computing, some poor misguided souls decided that the backslash wasn't very important in Japanese computing, while the Yen sign clearly was, and that it wasn't convenient to support high ANSI characters (U+00A0 - U+00FF), which include the half-width Yen sign at U+00A5. Accordingly they simply decided that the ASCII backslash code 0x5C should be redefined to be the Yen sign in Japanese contexts.

    And in today's Unicode world, we're still having to live with this debacle. With Japanese fonts, people just get used to seeing the Yen sign glyph used for the character code U+005C. The bottom line is that U+005C is ambiguous in Japanese contexts: it might represent a backslash and it might represent a Yen sign. Heuristics can be used to try to resolve the ambiguity, but the only real way to solve this quandry is for Japanese text programs to migrate all uses of the Yen sign to U+00A5 or to the full-width Yen sign U+FFE5. But this takes dedication and patience; a lot of Japanese-oriented code out there assumes that U+005C is the Yen sign. Furthermore code page 932 (Shift-JIS) is broken in this respect: at least in the mapping reference you cite below, there's no mapping of U+00A5 to a Shift-JIS code point. Note that Shift-JIS 0x818F corresponds to U+FFE5 (FULLWIDTH YEN SIGN), whereas U+00A5 is the half-width Yen sign.

    Meanwhile for your purposes (NB: I'm no Java expert), U+005C in Java code will be interpreted as a backslash except perhaps in Japanese-sensitive contexts.

    Murray

    -----Original Message-----
    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of Mike Ayers
    Sent: Tuesday, January 24, 2006 11:15 AM
    To: unicode@unicode.org
    Subject: I yen for a backslash!

            I'm trying to build a project on a Japanese Windows server. I am getting a failure and trying to track it down. I've gotten very confused about backslash and yen. The Unicode mapping
    (ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT)
    places backslash at x5c and yen at x818f. The console window on the Japanese server accepts (alt-numpad) 92 (x5c) as yen. Strangest of all, wordpad (alt-x) shows yen at x5c. Since wordpad deals in Unicode values, I interpret this to mean that a font level glyph substitution is being used. Can I interpret this to mean that my Java et. al. source code will be correctly interpreted by the compilers, even though they appear to have yen signs where all the backslashes should be?

            Thanks,

    /|/|ike



    This archive was generated by hypermail 2.1.5 : Tue Jan 24 2006 - 16:07:21 CST