Re: Mapping of SJIS control characters

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Mar 23 2009 - 18:29:37 CST

  • Next message: Asmus Freytag: "Re: Mapping of SJIS control characters"

    Tim Greenwood asked:

    > This question really belongs in the ICU-support mail list, but I tried
    > there and had no response. Some of the people who hang out here are
    > good at answering these obscure questions.
    >
    > The mapping from SJIS to Unicode (as seen on
    > http://www.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&s=ALL
    > ) has three odd conversions in the control range.
    >
    > 0x1A -> 0x1C
    > 0x1C -> 0x7F
    > 0x7F -> 0x1A
    >
    > I do not see anything equivalent in EUCJP mappings, nor can I find any
    > reference that shows JIS201differing from standard practice in the
    > control codes.

    I wouldn't expect that behavior at all to derived from JIS X 0201
    or EUC-JP. And I know I certainly don't support that kind of
    mapping for any SJIS variety.

    > I know that Unicode no longer supports these mapping tables, and even
    > when it did the SJIS table does not define these ranges.

    And I don't think it derives from any SJIS mapping ever posted
    on the Unicode website.

    >
    > Can anyone shed any light on this issue?

    My best guess is that this is an empirical mapping based
    on testing actual mapping behavior on one (or more)
    Windows varieties.

    And I suspect what is involved is some weird backwards compatibility
    issue having to do with the implementation of Ctrl-Z EOF marks
    in MS-DOS. 0x1A is always strange, because its 6429 definition
    is SUB, but it saw its widest usage in the CP/M --> MS-DOS line
    of OS development as an EOF mark.

    http://en.wikipedia.org/wiki/End_of_file

    Could be wrong, though. ;-)

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Mar 23 2009 - 18:32:26 CST