Re: FW: UNICODE versus Shift-JIS

From: Edward Cherlin (edward.cherlin.sy.67@aya.yale.edu)
Date: Sat Jun 17 2000 - 22:17:09 EDT


At 8:39 AM -0800 6/16/00, Magda Danish (Unicode) wrote:
>Got this request by phone and email at the unicode home office. Could
>anyone respond directly to the list and cc to ken_buis@agilent.com
>
>Thanks. Magda.
>
>-----Original Message-----
>From: Ken Buis [mailto:kbuis@an.hp.com]
>Sent: Friday, June 16, 2000 9:11 AM
>To: info@unicode.org
>Cc: ken_buis@agilent.com
>Subject: UNICODE versus Shift-JIS
>
>
>Hello,
>
>I'm currently researching the effort involved with localizing a medical
>product to Japanese. This product is display-only, no data input is
>involved. Some people are suggesting I translate the user interface
>interface using the Shift-JIS character set, others support UNICODE.

What is the platform? Java? Windows CE? Something of Agilent's? It
makes a big difference. What fonts do you have?

>I'd
>like to know how the characters from the two sets map to each other.

See The Unicode Standard Version 3.0 (Addison-Wesley 2000) and CJKV
Information Processing, by Ken Lunde (O'Reilly 1999) for extensive
details. In particular, section 15.2 of the standard, Shift-JIS
Index, pp. 923-958 begins with these code points.

SJIS UNICODE
889F 4E9C
88A0 5516
88A1 5A03
8802 963F
...

>For
>example, would character #F123 in the Shift-JIS set be the same as
>character #F123 in the UNICODE set.

Neither standard has a character at that code point, but more
generally, the answer is No, there is no numeric correspondence, as
the brief quotation above shows.

>If not, are the characters stored in
>the same order in both sets, but at different offsets within each set.

No. SJIS does not have any simple ordering principle, since it
derives from encodings that have accreted blocks from several sources
over time. The order of the original CJK Unified block in Unicode is
derived from several merged radical/stroke count dictionary orders,
but other blocks will be added in future.

>For example, the Shift-JIS set starts at offset 0x000 and their
>equivalent characters start at offset 0xFF00 in the UNICODE set.

Hypothetically, you mean? Actually, the CJK Unified Ideographs block
starts at 4E00, and the SJIS Kanji start at 889F--with different
characters.

>If that
>is true, then character #0002 in the Shift-JIS set would be the same as
>character #FF02 in the UNICODE set.

The Unicode web site and the CD-ROM in the standard both contain
Shift-JIS/Unicode mapping tables suitable for use in software.

>Any assistance would be greatly appreciated.
>
>Ken Buis
>Agilent Technologies
>978-659-4859

Edward Cherlin
Generalist
"A knot!" exclaimed Alice. "Oh, do let me help to undo it."
Alice in Wonderland



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT