Unihan Vietnamese Readings

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Tue Nov 25 2003 - 09:46:03 EST

  • Next message: Philippe Verdy: "RE: Normalisation stability, was: Compression through normalization"

    I've been looking at the Vietnamese readings given in the Unihan database
    recently, and although I don't know Vietnamese, I think there may be something
    not quite right with some of them, and so I wondered if anyone on this list who
    knows Vietnamese could confirm the validity of the Unihan Vietnamese readings.

    Since Unicode 3.2 the Unihan database has included Vietnamese Nôm readings for
    164 basic CJK ideographs (from U+66F2 up to U+9C31, which is odd in itself), 122
    CJK-A ideographs, and 4,230 CJK-B ideographs. The Vietnamese readings for the
    CJK-A and CJK-B ideographs look like phonetic variations on the original Chinese
    pronunciations of the ideographs (as would be expected), but none of the
    Vietnamese readings for the 164 basic CJK ideographs bear any correspondence
    with the Chinese pronunciations for the same ideographs.

    I used the excellant Nôm Lookup Tool provided by the Nôm Foundation
    (http://www.nomfoundation.org/nomdb/lookup.php) to check the Vietnamese readings
    given in the Unihan database, and found that the Nôm readings for a random
    sample of CJK-A and CJK-B ideographs exactly matched the readings given in the
    Unihan database. On the other hand, none of the readings given by the Nôm Lookup
    Tool for basic CJK ideographs (between U+66F2 and U+9C31) matched the readings
    given in the Unihan database.

    For example, the Unihan database has the following readings for these three
    basic CJK ideographs :

    U+66F2 kVietnamese gi&#7843; <U+0067, U+0069, U+1EA3>
    U+66F4 kVietnamese xâu <U+0078, U+00E2, U+0075>
    U+6771 kVietnamese h&#7889;c <U+0068, U+1ED1, U+0063>

    On the other hand the Nôm Lookup Tool gives the following readings for the same
    ideographs :

    U+66F2 = khúc <U+006B, U+0068, U+00FA, U+0063>
    U+66F4 = canh <U+0063, U+0061, U+006E, U+0068>
    U+6771 = ðông <U+0111, U+00F4, U+006E, U+0067>

    And looking up the Unihan Vietnamese readings for these three ideographs with
    the Nôm Lookup Tool gives the following results :
    gi&#7843; = U+4F3D or U+5047 or U+5056 or U+8005 or U+8D6D
    xâu = U+507B or U+641C or U+22D1C or U+22E64 or U+26113
    h&#7889;c = U+561D or U+21417

    Can anyone tell me whether this discrepancy between the Unihan Vietnamese
    readings and the readings given by the Nôm Lookup Tool is due to an error in the
    Unihan database or due to my lack of understanding of Vietnamese ?

    Andrew



    This archive was generated by hypermail 2.1.5 : Tue Nov 25 2003 - 10:52:05 EST