Re: Converting between Shift-JIS and Unicode

From: John Jenkins (
Date: Thu Apr 01 2004 - 18:25:31 EST

  • Next message: Rick Cameron: "RE: Converting between Shift-JIS and Unicode"

    On Apr 1, 2004, at 3:41 PM, Rick Cameron wrote:

    > It appears that Unihan.txt does not include mappings to Shift-JIS, and
    > that the only file on that contains mappings between
    > Shift-JIS and Unicode is in the 'obsolete' section.

    Unihan.txt *does* include mappings between Shift-JIS and Unihan for the
    ideographic portions of Unicode. It always has. You may have to
    convert the kJis0 field to a form you're used to, but it's there.

    > Why is that? Did TUC decide it no longer wanted to provide an official
    > mapping table? If so, why?

    Two reasons.

    1) Conversion between Unicode and shift-JIS (for the non-ideographic
    portions of Unicode) is very complex because of the numerous dialects
    of shift-JIS that are running wild out there. As a rule, the best way
    to find out how to convert between Unicode and a given character set is
    to get the data files from people who do the conversion (e.g., MS,
    Apple) and see how they do it.

    2) Nobody was willing to maintain the official mapping.

    John H. Jenkins

    This archive was generated by hypermail 2.1.5 : Thu Apr 01 2004 - 18:57:11 EST