Re: JIS X 0208 mappings in Unihan.txt

From: Theo Veenker (Theo.Veenker@let.uu.nl)
Date: Thu Jun 02 2005 - 16:45:41 CDT

  • Next message: Jon Hanna: "Re: Ligatures fi and ffi"

    Erik van der Poel wrote:
    > Theo Veenker wrote:
    >
    >> But I understand that there is a big difference between
    >> the 78 and 83 version. According to
    >> http://www.io.com/~kazushi/encoding/jis.html
    >> in the 83 version not only were characters added, but also many changed
    >> or exchanged (whatever that means).
    >
    >
    > The changed characters had their glyphs changed, often in small ways.
    > See p. 920-922 in CJKV. The exchanged characters involved swapping 22
    > pairs of simplified and traditional characters (p. 919, Category 3). The
    > "added and exchanged" characters involve 4 pairs of simplified and
    > traditional characters where the simplified form was not in 78 but is in
    > 83, at the position occupied by the traditional form in 78 (p. 919,
    > Category 2).
    >

    So a text originally composed using the 78 character set will look
    slightly different when using a later version of the character set,
    but otherwise be OK.

    >> Also for the 90 version were characters
    >> changed. If this is really true, aren't the revised character sets
    >> incompatible with the earlier versions?
    >
    >
    > I suppose some people would consider these changes to be big enough to
    > call them "incompatible". I still don't know whether there are any high
    > quality Unicode mapping tables for the 78 and 83 versions. I thought
    > John Jenkins might make comments in this thread, but so far, no luck. I
    > did notice that the last pair of characters shown on CJKV p. 922
    > _appears_(?) to have been given separate Unicodes on p. 1201 of the
    > Unicode 4.0 book, near the bottom (Spoon radical, U+5315 and U+2090E).

    The latter is in the 'rare ideographs' section. According to Unihan.txt
    its sources are IRG G, KP and T. Apparently it was also contained in
    JIS X 0208-1983 but that is not one of the IRG sources (only the '90
    version is). If there are so many subtle changes between different
    versions of the same coded character set wouldn't make sense to have
    tags for all of these versions available in Unihan.txt? We have
    "completely accurate" JIS X 0208-1990 mappings in Unihan.txt but
    using these one can only build a not completely accurate ISO-2022-
    JP[-1/2] processor. However as you have indicated before many
    implementations don't seem to care.

    Theo



    This archive was generated by hypermail 2.1.5 : Thu Jun 02 2005 - 16:47:43 CDT