Re: Medieval CJK race-horse names (was Re: Bantu click letters )

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jun 11 2004 - 14:20:59 CDT

  • Next message: John Jenkins: "Re: Medieval CJK race-horse names (was Re: Bantu click letters )"

    > On Jun 11, 2004, at 6:44 AM, Andrew C. West wrote:
    >
    > > Depite the oft-mentioned cutesy Hong Kong race horse names,
    > > idiosyncratic
    > > invented Han ideographs are a negligible component of the encoded CJK
    > > repertoire. In my opinion there are thousands, possibly tens of
    > > thousands, of
    > > ideographs that should not really have been encoded individually as
    > > they are
    > > simply minor glyph variants, frequently only attested in a single
    > > source because
    > > the author simply wrote the character wrongly in the first place. This
    > > is the
    > > real issue with the over-encoding of CJKV, not the occasional race
    > > horse name.

    >
    > In particular, the decision to import en masse the repertoire of the
    > Hanyu Da Zidian was not a wise one, as a substantial number of the
    > entries are of the form "same as X".

    Andrew and John have correctly identified the bulk of the problem
    for CJKV overencoding.

    Unfortunately, given the nature of the Han script and the
    historical practice of Chinese lexicography, the result we
    have ended up with is almost inevitable.

    This historic mistakes, minor glyph variants, and such got
    carried into scholastic compendia *as characters*, where they
    become lexical headwords, repeated ad infinitum, in each
    further edition and each new compendium. The fact that they
    got carried into the Hanyu Da Zidian, the Chinese moral
    equivalent of the Oxford English Dictionary, means that
    inevitably they end up in the character encoding, as digital
    representation of the Hanyu Da Zidian is absolutely required.
    Leaving some out, no matter how mistaken or obsolete, would,
    from the Chinese point of view be like deciding to leave
    some obsolete word out of the OED simply because there
    wasn't a "character" encoded for it.

    It would have been nice if a better mechanism for expressing
    Han glyphic (and other types of) variants had been feasible
    and in place before CJK Extension B went in, but that is
    water under the bridge now. One can only hope that some
    restraint and use of alternative mechanisms will be shown
    in the current effort to define and encode additional CJK
    extensions, which involve even *less* useful characters, for
    the most part, missed even by the major dictionary compendia.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri Jun 11 2004 - 14:52:19 CDT