Encoding Pronunciation (was: Comment on PRI 98: IVD Adobe-Japan1 (pt.2))

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Thu Mar 22 2007 - 20:26:21 CST

  • Next message: Andrew West: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"

    Eric Muller wrote on Wednesday, March 21, 2007 3:17 PM
    Subject: Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)

    > The case of the pronunciation variants is a bit more delicate. With
    > today's understanding of what character encoding is about, I think it's
    > fair to say that accommodating pronunciation variants in plain text is a
    > non-goal, and in fact a misguided effort, in any character standard. Can
    > you imagine having two coded characters for each ideograph used in Japan,
    > one for On reading and one for Kun reading?

    But don't we already have something like that for Welsh and Slovak? The
    lower case Welsh letter 'ng', which represents a velar nasal, is encoded as
    <U+006E LATIN SMALL LETTER N, U+0067 LATIN SMALL LETTER G> (e.g. Angharad),
    while the 'coincidental' occurrence of a nasal and a voiced velar stop
    should be encoded as <U+006E, U+034F COMBINING GRAPHEME JOINER, U+0067>
    (e.g. Bangor and Llangollen) if you want it to collate properly without
    dictionary look-ups. (Without CGJ, 'Llangollen' would collate before
    'Llanberis', as 'ng' comes between 'g' and 'h'.) I believe that the
    distinction between <U+17D2 KHMER SIGN COENG, U+178A KHMER LETTER DA> and
    <U+17D2 KHMER SIGN COENG, U+178F KHMER LETTER TA> is likewise phonetic
    (rather than etymological), but I can no longer find the definition of the
    difference between these two graphically identical sequences. The crucial
    point in at least the Welsh and Slovak cases is that the difference affects
    collation order.

    While on this subject, is there a recommended way of distinguishing in the
    encoding the Khmer letter ba pronounced /b/ and the Khmer letter ba
    pronounced /p/ (as in many Indic loans) when they precede vowels? In Khmer
    the latter sorts equal to <U+1794 KHMER LETTER BA, U+17C9 KHMER SIGN
    MUUSIKATOAN> at the primary level. There has been a discussion on Khmer
    collation, but I couldn't find a resolution of this issue.


    This archive was generated by hypermail 2.1.5 : Thu Mar 22 2007 - 20:29:16 CST