Re: Shift-JIS conversion.

From: Philippe Verdy (
Date: Thu Nov 25 2004 - 14:26:47 CST

  • Next message: Doug Ewell: "Re: Misuse of 8th bit [Was: My Querry]"

    ----- Original Message -----
    From: Addison Phillips [wM]
    To: pragati ;
    Sent: Thursday, November 25, 2004 6:21 PM
    Subject: RE: Shift-JIS conversion.

    Dear Pragati,

    You can write your own conversion, of course. The mapping tables of
    Unicode->SJIS are readily availably. You should note that there are several
    vendor specific variations in the mapping tables. Notably Microsoft code
    page 932, which is often called Shift-JIS, has more characters in its
    character set than "standard" Shift-JIS (and it maps a few characters
    differently too...)

    > The important fact that you should be aware of: Shift-JIS is an encoding
    > of the JIS X0208 character set.
    > UTF-8 is an encoding of the Unicode character set.

    More exactly, UTF-8 is an encoding of the ISO/IEC 10646 character set (the
    character set here designates the set of characters, i.e. the repertoire
    that describes characters with a name and a representative glyph and some
    annotations, to which a numeric code is then assigned, the code point. The
    char. set is

    Unicode by itself is not a character set, only an implementation of the
    ISO/IEC 10646 character set, in which which the Unicode standard assign
    additional properties and behavior for characters allocated in ISO/IEC
    10646. The link between Unicode and ISO/IEC 10646 is the assigned code point
    and character name, which are now common between the two standards.

    Of course the Unicode technical commitee may propose new assignments to
    ISO/IEC, but this is still ISO/IEC 10646 which maintains the repertoire and
    approves or rejects the proposals. A new character proposal may be rejected
    by Unicode, but accepted by ISO/IEC 10646; and it is the ISO/IEC 10646 vote
    that prevails (so Unicode will have to accept this ISO/IEC decision, even if
    it has voted against it in a prior decision).

    On the opposite, ISO/IEC 10646 says nothing about character properties or
    behaviors. It can suggest, but the Unicode committee will make its own
    decisions for the character properties and behavior that it chooses to
    standardize. If Unicode wants to make its decisions widely accepted by all
    users of the ISO/IEC 10646 repertoire, it's in the interest of Unicode of
    trying to make these decisions in conformance with other existing national
    or international standards, to maximize interoperability of national or
    international applications based on the ISO/IEC 10646 character set.

    This archive was generated by hypermail 2.1.5 : Thu Nov 25 2004 - 14:29:38 CST