Re: [unicode] Unihan database: kCangjie field

From: John H. Jenkins (jenkins@apple.com)
Date: Sun Jun 14 2009 - 20:45:05 CDT

  • Next message: mpsuzuki@hiroshima-u.ac.jp: "Re: [unicode] Unihan database: kCangjie field"

    If someone is willing to do the work to contact these people, get
    their permission, and write up a document for the UTC describing the
    data and provide Richard Cook or me with the actual data, then I don't
    think that there would be any real problem to adding it.

    Basically, here as elsewhere, the actual work involved is likely to be
    more time-consuming than one thinks and neither Dr. Cook nor I have as
    much time as we would like to devote to it. The best way to see that
    something makes it into the Unihan database is to do the work of data
    collection for us.

    在 Jun 15, 2009 1:57 AM 時, Charlie Ruland 寫到:

    > If it is true that the Unihan database has Cangjie v.3 input codes
    > for only 29,148 characters, whereas Malaysia’s Friends of Cangjie
    > have Cangjie v.5 codes for all CJK[V] unified ideographs of Unicode
    > 4.0, why not add a “kCangjie5” field based on the more exhaustive
    > data from Malaysia to the Unihan database (or, entirely replace the
    > Cangjie v.3 data of the “kCangjie” field with the Cangjie v.5
    > data)?
    >
    > BTW, Malaysia’s Friends of Cangjie seem to be willing to have their
    > data published: e.g., the English Wiktionary has the page http://en.wiktionary.org/wiki/Wiktionary:Chinese_Cangjie_index
    > where it says: “Cāngjié data was taken from www.chinesecj.com
    > with permission.”
    >
    > Charlie
    >
    > -------- Original Message --------
    > Subject: Re: [unicode] Unihan database: kCangjie field
    > From: mpsuzuki@hiroshima-u.ac.jp
    > To: Charlie Ruland <ruland@luckymail.com>
    > Date: Sun Jun 14 2009 07:30:59 GMT+0200
    >> Hi,
    >>
    >> Checking the kCangjie entry for U+9762 (面) in Unihan.txt,
    >> we can find this line:
    >>
    >> U+9762 kCangjie MWYL
    >>
    >> I guess, this is Cangjie version 3 style.
    >> If it's version 5 style, it should be MWSL.
    >>
    >> http://zh.wikipedia.org/wiki/%E5%80%89%E9%A0%A1%E8%BC%B8%E5%85%A5%E6%B3%95
    >>
    >> According to UTR#38, kCangjie field is based on Christian
    >> Wittern's cangjie-table.b5.
    >>
    >>
    >>> Tag: kCangjie
    >>> Status: Provisional
    >>> Category: Dictionary-like Data
    >>> Separator: space
    >>> Syntax: [A-Z]+
    >>> Description: The cangjie input code for the character.
    >>> This incorporates data from the file cangjie-table.b5
    >>> by Christian Wittern.
    >>>
    >>
    >> According to Christian Wittern's web site at Kyoto Univ.,
    >> it seems that he has not updated cangjie-table.b5 since
    >> 1993-Nov.
    >>
    >> http://kanji.zinbun.kyoto-u.ac.jp/~wittern/publications/data/index.html
    >>
    >>> Cangjie Table: Table of all cangjie input keys,
    >>> with radical / stroke and BIG5 code ,
    >>> in: ftp://ifcss.org/software/data, November 1993.
    >>>
    >>
    >> I think the popular version of cangjie-table.b5 used in
    >> various free softwares is 1.02 released on 1993-May.
    >> e.g.
    >> http://linenum.info/p/emacs/22.1/leim/MISC-DIC/cangjie-table.b5?page=1
    >> http://linenum.info/p/emacs/22.1/leim/MISC-DIC/cangjie-table.b5?page=27
    >> It includes 13059 entries to cover Big5 with ETen extension.
    >>
    >> On the other hand, Unihan.txt 5.1.0 (2008-Mar-03) includes
    >> 29148 entries. I don't know who added extra kCangjie to
    >> cover the characters which are not included in original
    >> cangjie-table.b5 by Christian.
    >>
    >> Regards,
    >> mpsuzuki
    >>
    >> On Sat, 13 Jun 2009 19:14:49 +0200
    >> Charlie Ruland <ruland@luckymail.com> wrote:
    >>
    >>
    >>> The Cangjie input code of which Cangjie version is given in the
    >>> Unihan database?
    >>> I couldn't find any explicit information on this in the Unicode
    >>> Standard Annex #38: Unicode Han Database (Unihan) at http://www.unicode.org/reports/tr38/
    >>> .
    >>> FYI, I use a Cangjie version 5 IME (第五代倉頡輸入法)
    >>> designed by and downloaded from Malaysia’s Friends of Cangjie (倉
    >>> 頡之友。馬來西亞 at http://www.chinesecj.com/newsoftware/
    >>> index3.php?Type=1 ) and which promises to support input of some
    >>> 70,000 characters.
    >>> Are all Unihan kCangjie codes usable on my IME?
    >>>
    >>> Charlie
    >>>
    >>> --
    >>> ___ Charlie Ruland ___ 冉書慧 ___
    >>> ERROR__COMMVNIS__FACIT__IVS
    >>>
    >>>
    >>>
    >>
    >>
    >>
    >
    > --
    > — Charlie Ruland — 冉書慧 —
    > ERROR·COMMVNIS·FACIT·IVS
    >
    >

    =====
    John H. Jenkins
    jenkins@apple.com



    This archive was generated by hypermail 2.1.5 : Sun Jun 14 2009 - 20:49:19 CDT