Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)

From: Andrew West (andrewcwest@gmail.com)
Date: Thu Mar 22 2007 - 05:46:02 CST

Next message: Andrew West: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"

Previous message: mpsuzuki@hiroshima-u.ac.jp: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"
In reply to: mpsuzuki@hiroshima-u.ac.jp: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"
Next in thread: Richard Wordingham: "Encoding Pronunciation (was: Comment on PRI 98: IVD Adobe-Japan1 (pt.2))"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 22/03/07, mpsuzuki@hiroshima-u.ac.jp <mpsuzuki@hiroshima-u.ac.jp> wrote:
>
> As posting from Japan, I'm ashamed that I have to ask
> about the popularity of Mojikyo character set. I heard
> the "character number" (Japanese Tangutologists calls
> as "Nishida number") of Tangut Ideographs might be
> popular among Tangutology, and the numbering is used in
> Mojikyo character set, but I've never heard about the
> popularity of information interchange via coded text by
> any non-Unicode encoding. If you're familiar, please
> let me know.

The French Tangutologist Guillaume Jacques uses the Mojikyo fonts for
Tangut in his PDF documents:
<http://xiang.free.fr/IATS2006.pdf> (Nouveau recueil sur la compassion
et la piété filiale)
<http://xiang.free.fr/origine.pdf> (Le poème sur l'origine des tangoutes)
<http://xiang.free.fr/numeraux.pdf> (Les numéraux du tangoute)
<http://xiang.free.fr/Or12380-19.pdf> (British Library, Or. 12380-19)

Nishida uses the Mojikyo font for his PDF documents as well, but
embedded as images:
<http://www.iop.or.jp/0515/nishida2.pdf> (Xixia Language Studies and
the Lotus Sutra)
<http://www.iop.or.jp/0414/nishida.pdf> (On the Xixia Version of the
Lotus Sutra)

> I'm afraid that the roundtrip conversion
> with transliterated resource can be expected, too.
> In fact, I'm not sure Mojikyo-oriented solution can
> handle non-BMP codepoints correctly.
>

Roundtripping to Mojikyo is difficult because Mojikyo is a "rich text"
encoding that maps all characters to Shift-JIS ideographic codepoints,
and you have to select the correct Mojikyo font to use for any given
range of Mojikyo numbers. So for Tangut (Mojikyo numbers
570001..576000), Mojikyo numbers 570001 through 575280 map to
Shift-JIS codepoints 8A8F through E79E and need to be displayed using
the M202 font, whereas Mojikyo numbers 575281 through 576000 map to
Shift-JIS codepoints 889F through 8C7D and need to be displayed using
the M203 font. However, it is possible to algorithmically determine
the Shift-JIS codepoint and font to use for any given Mojikyo
character from its six-digit Mojikyo number, so roundtripping is
possible.

> If the different readings of same Tangut characters are so
> important in information interchange among Tangutlogists
> that the readings are not skippable, the idea of "Tangut
> Compatibility Ideographs" will work better, I suppose.
>

They are not *so* important, but the Mojikyo set does include a
significant number of duplicate characters due to the fact that the
dictionary that the Mojikyo Tangut encoding is based on has separate
entries for some characters that share the same glyph shape. In the
current draft Tangut proposal the duplicate characters in Li Fanwen's
dictionary (tXiaHan) and the Mojikyo set are not treated consistantly.
For example the following pairs of characters with identical glyphs
are proposed for encoding separately,

Mojikyo/Li Fanwen #0406/0407 = U+17F62/U+17F63
Mojikyo/Li Fanwen #3683/3684 = U+17D9D/U+17D9E
Mojikyo/Li Fanwen #4456/4457 = U+17A37/U+17A38
Mojikyo/Li Fanwen #5190/5191 = U+18236/U+18237

whereas for Mojikyo/Li Fanwen #2298/2299, only one character is
proposed (U+17A4D). And many examples of minor glyph variants in the
Mojikyo/Li Fanwen set have not been included in the draft proposal,
when, in my opinion, they should be dealt with in some way (either
seperate codepoints or as variation sequences) to allow for full
roundtripping.

I don't believe that encoding Tangut characters with identical glyph
shapes (as in the current draft proposal) will meet acceptance from
the UTC or WG2, but I'm not sure if a VS solution would be acceptable
either, as using variation sequences for identical glyphs seems to me
to be extending the purpose of variation selectors beyond their
current definition.

Andrew

Next message: Andrew West: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"
Previous message: mpsuzuki@hiroshima-u.ac.jp: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"
In reply to: mpsuzuki@hiroshima-u.ac.jp: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"
Next in thread: Richard Wordingham: "Encoding Pronunciation (was: Comment on PRI 98: IVD Adobe-Japan1 (pt.2))"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Mar 22 2007 - 05:48:54 CST