Re: Status of Unihan Mandarin readings?

From: John H. Jenkins (jenkins@apple.com)
Date: Thu Dec 19 2002 - 11:06:13 EST

  • Next message: David J. Perry: "RE: h in Greek epigraphy"

    On Thursday, December 19, 2002, at 06:05 AM, Andrew C. West wrote:

    >
    >> - Any estimates for when it will be possible publish a fixed version?
    >
    > I'll let Mr. Jenkins answer that one.
    >

    Unicode 4.0 timeframe. We'll also try to get the preferred Mandarin
    (and possibly Cantonese) readings marked by then.

    >> - Any suggestion for interim work-arounds (e.g., an older version of
    >> the
    >> file, an alternative source)?
    >
    > Use the Unihan database for Unicode 3.0 at
    > http://www.unicode.org/Public/3.0-Update/Unihan-3.txt
    >
    > This is the latest uncorrupted version.
    >

    This is what's weird about this whole thing. I can't figure out how
    the corruption took place between Unicode 3.0 and 3.1. At least it'll
    make it easier to fix.

    Meanwhile, one caveat regarding the pronunciations supplied in the
    Unihan database. While we do try to be accurate and careful and while
    we do try to use reliable sources, we are not lexicographers ourselves,
    and there's not much we can do when our sources don't agree. For
    Mandarin this is a fairly minor problem, but it's a bit more extensive
    for Cantonese. One cause of this is that languages are moving targets,
    and the pronunciations themselves can change over time. Another is
    that sometimes people extrapolate the pronunciation for one dialect
    from the pronunciation from another, or from the pronunciation given in
    a classical dictionary such as the KangXi. And, for Cantonese in
    particular, sometimes characters are new enough that we can't go to
    dictionaries but have to rely on the "man in the street" for the
    pronunciation (we had a case like this come up in the last IRG). And
    sometimes our fingers just trip over each other while we type.

    While I think the readings we provide are useful and an important
    adjunct to the Unihan database, I'm not sure I'd want to use these
    readings if I were developing a commercial-grade product or writing a
    scholarly treatise.

    ==========
    John H. Jenkins
    jenkins@apple.com
    jhjenkins@mac.com
    http://www.tejat.net/



    This archive was generated by hypermail 2.1.5 : Thu Dec 19 2002 - 12:09:16 EST