Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)

From: Andrew West (andrewcwest@gmail.com)
Date: Thu Mar 22 2007 - 05:46:02 CST

  • Next message: Andrew West: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"

    On 22/03/07, mpsuzuki@hiroshima-u.ac.jp <mpsuzuki@hiroshima-u.ac.jp> wrote:
    >
    > As posting from Japan, I'm ashamed that I have to ask
    > about the popularity of Mojikyo character set. I heard
    > the "character number" (Japanese Tangutologists calls
    > as "Nishida number") of Tangut Ideographs might be
    > popular among Tangutology, and the numbering is used in
    > Mojikyo character set, but I've never heard about the
    > popularity of information interchange via coded text by
    > any non-Unicode encoding. If you're familiar, please
    > let me know.

    The French Tangutologist Guillaume Jacques uses the Mojikyo fonts for
    Tangut in his PDF documents:
    <http://xiang.free.fr/IATS2006.pdf> (Nouveau recueil sur la compassion
    et la piété filiale)
    <http://xiang.free.fr/origine.pdf> (Le poème sur l'origine des tangoutes)
    <http://xiang.free.fr/numeraux.pdf> (Les numéraux du tangoute)
    <http://xiang.free.fr/Or12380-19.pdf> (British Library, Or. 12380-19)

    Nishida uses the Mojikyo font for his PDF documents as well, but
    embedded as images:
    <http://www.iop.or.jp/0515/nishida2.pdf> (Xixia Language Studies and
    the Lotus Sutra)
    <http://www.iop.or.jp/0414/nishida.pdf> (On the Xixia Version of the
    Lotus Sutra)

    > I'm afraid that the roundtrip conversion
    > with transliterated resource can be expected, too.
    > In fact, I'm not sure Mojikyo-oriented solution can
    > handle non-BMP codepoints correctly.
    >

    Roundtripping to Mojikyo is difficult because Mojikyo is a "rich text"
    encoding that maps all characters to Shift-JIS ideographic codepoints,
    and you have to select the correct Mojikyo font to use for any given
    range of Mojikyo numbers. So for Tangut (Mojikyo numbers
    570001..576000), Mojikyo numbers 570001 through 575280 map to
    Shift-JIS codepoints 8A8F through E79E and need to be displayed using
    the M202 font, whereas Mojikyo numbers 575281 through 576000 map to
    Shift-JIS codepoints 889F through 8C7D and need to be displayed using
    the M203 font. However, it is possible to algorithmically determine
    the Shift-JIS codepoint and font to use for any given Mojikyo
    character from its six-digit Mojikyo number, so roundtripping is
    possible.

    > If the different readings of same Tangut characters are so
    > important in information interchange among Tangutlogists
    > that the readings are not skippable, the idea of "Tangut
    > Compatibility Ideographs" will work better, I suppose.
    >

    They are not *so* important, but the Mojikyo set does include a
    significant number of duplicate characters due to the fact that the
    dictionary that the Mojikyo Tangut encoding is based on has separate
    entries for some characters that share the same glyph shape. In the
    current draft Tangut proposal the duplicate characters in Li Fanwen's
    dictionary (tXiaHan) and the Mojikyo set are not treated consistantly.
    For example the following pairs of characters with identical glyphs
    are proposed for encoding separately,

    Mojikyo/Li Fanwen #0406/0407 = U+17F62/U+17F63
    Mojikyo/Li Fanwen #3683/3684 = U+17D9D/U+17D9E
    Mojikyo/Li Fanwen #4456/4457 = U+17A37/U+17A38
    Mojikyo/Li Fanwen #5190/5191 = U+18236/U+18237

    whereas for Mojikyo/Li Fanwen #2298/2299, only one character is
    proposed (U+17A4D). And many examples of minor glyph variants in the
    Mojikyo/Li Fanwen set have not been included in the draft proposal,
    when, in my opinion, they should be dealt with in some way (either
    seperate codepoints or as variation sequences) to allow for full
    roundtripping.

    I don't believe that encoding Tangut characters with identical glyph
    shapes (as in the current draft proposal) will meet acceptance from
    the UTC or WG2, but I'm not sure if a VS solution would be acceptable
    either, as using variation sequences for identical glyphs seems to me
    to be extending the purpose of variation selectors beyond their
    current definition.

    Andrew



    This archive was generated by hypermail 2.1.5 : Thu Mar 22 2007 - 05:48:54 CST