Re: [unicode] Unihan database: kCangjie field

From: mpsuzuki@hiroshima-u.ac.jp
Date: Sun Jun 14 2009 - 21:20:40 CDT

  • Next message: John H. Jenkins: "Re: [unicode] Unihan database: kCangjie field"

    Hi,

    If the data submission to UTR#38 is required to permit
    some typical usages etc, please let me/him know. In the
    preamble of Unihan.txt, generic term of use for the data
    on Unicode site is referred, aslike:

            For terms of use, see <http://www.unicode.org/terms_of_use.html>

    It is very permissive to redistribute, modify, and use
    in any products.

    In the main text of UTR#38, it is acknowledged that the
    data in kCantonete field are copyrighted, it can be used
    in any products, as far as the copyright acknowledge is
    included.

    I'm questionable if the copyright holders of Cangjie v5
    data are willing to publish their data with such permissive
    license.

    Regards,
    mpsuzuki

    On Mon, 15 Jun 2009 09:45:05 +0800
    "John H. Jenkins" <jenkins@apple.com> wrote:

    >If someone is willing to do the work to contact these people, get
    >their permission, and write up a document for the UTC describing the
    >data and provide Richard Cook or me with the actual data, then I don't
    >think that there would be any real problem to adding it.
    >
    >Basically, here as elsewhere, the actual work involved is likely to be
    >more time-consuming than one thinks and neither Dr. Cook nor I have as
    >much time as we would like to devote to it. The best way to see that
    >something makes it into the Unihan database is to do the work of data
    >collection for us.
    >
    >$B:_(B Jun 15, 2009 1:57 AM $B;~!$(B Charlie Ruland $BUmE~!'(B
    >
    >> If it is true that the Unihan database has Cangjie v.3 input codes
    >> for only 29,148 characters, whereas Malaysia$B!G(Bs Friends of Cangjie
    >> have Cangjie v.5 codes for all CJK[V] unified ideographs of Unicode
    >> 4.0, why not add a "kCangjie5" field based on the more exhaustive
    >> data from Malaysia to the Unihan database (or, entirely replace the
    >> Cangjie v.3 data of the "kCangjie" field with the Cangjie v.5
    >> data)?
    >>
    >> BTW, Malaysia's Friends of Cangjie seem to be willing to have their
    >> data published: e.g., the English Wiktionary has the page
    >> http://en.wiktionary.org/wiki/Wiktionary:Chinese_Cangjie_index
    >> where it says: "Cangjie data was taken from www.chinesecj.com
    >> with permission."
    >>
    >> Charlie
    >>
    >> -------- Original Message --------
    >> Subject: Re: [unicode] Unihan database: kCangjie field
    >> From: mpsuzuki@hiroshima-u.ac.jp
    >> To: Charlie Ruland <ruland@luckymail.com>
    >> Date: Sun Jun 14 2009 07:30:59 GMT+0200
    >>> Hi,
    >>>
    >>> Checking the kCangjie entry for U+9762 ($BLL(B) in Unihan.txt,
    >>> we can find this line:
    >>>
    >>> U+9762 kCangjie MWYL
    >>>
    >>> I guess, this is Cangjie version 3 style.
    >>> If it's version 5 style, it should be MWSL.
    >>>
    >>> http://zh.wikipedia.org/wiki/%E5%80%89%E9%A0%A1%E8%BC%B8%E5%85%A5%E6%B3%95
    >>>
    >>> According to UTR#38, kCangjie field is based on Christian
    >>> Wittern's cangjie-table.b5.
    >>>
    >>>
    >>>> Tag: kCangjie
    >>>> Status: Provisional
    >>>> Category: Dictionary-like Data
    >>>> Separator: space
    >>>> Syntax: [A-Z]+
    >>>> Description: The cangjie input code for the character.
    >>>> This incorporates data from the file cangjie-table.b5
    >>>> by Christian Wittern.
    >>>>
    >>>
    >>> According to Christian Wittern's web site at Kyoto Univ.,
    >>> it seems that he has not updated cangjie-table.b5 since
    >>> 1993-Nov.
    >>>
    >>> http://kanji.zinbun.kyoto-u.ac.jp/~wittern/publications/data/index.html
    >>>
    >>>> Cangjie Table: Table of all cangjie input keys,
    >>>> with radical / stroke and BIG5 code ,
    >>>> in: ftp://ifcss.org/software/data, November 1993.
    >>>>
    >>>
    >>> I think the popular version of cangjie-table.b5 used in
    >>> various free softwares is 1.02 released on 1993-May.
    >>> e.g.
    >>> http://linenum.info/p/emacs/22.1/leim/MISC-DIC/cangjie-table.b5?page=1
    >>> http://linenum.info/p/emacs/22.1/leim/MISC-DIC/cangjie-table.b5?page=27
    >>> It includes 13059 entries to cover Big5 with ETen extension.
    >>>
    >>> On the other hand, Unihan.txt 5.1.0 (2008-Mar-03) includes
    >>> 29148 entries. I don't know who added extra kCangjie to
    >>> cover the characters which are not included in original
    >>> cangjie-table.b5 by Christian.
    >>>
    >>> Regards,
    >>> mpsuzuki
    >>>
    >>> On Sat, 13 Jun 2009 19:14:49 +0200
    >>> Charlie Ruland <ruland@luckymail.com> wrote:
    >>>
    >>>
    >>>> The Cangjie input code of which Cangjie version is given in the
    >>>> Unihan database?
    >>>> I couldn't find any explicit information on this in the Unicode
    >>>> Standard Annex #38: Unicode Han Database (Unihan) at http://www.unicode.org/reports/tr38/
    >>>> .
    >>>> FYI, I use a Cangjie version 5 IME ($BBh8^BeARpvM"F~K!(B)
    >>>> designed by and downloaded from Malaysia's Friends of Cangjie ($BAR(B
    >>>> $BpvG7M'!#GOPT@>P3(B at http://www.chinesecj.com/newsoftware/
    >>>> index3.php?Type=1 ) and which promises to support input of some
    >>>> 70,000 characters.
    >>>> Are all Unihan kCangjie codes usable on my IME?
    >>>>
    >>>> Charlie
    >>>>
    >>>> --
    >>>> ___ Charlie Ruland ___ $BQG=q7E(B ___
    >>>> ERROR__COMMVNIS__FACIT__IVS
    >>>>
    >>>>
    >>>>
    >>>
    >>>
    >>>
    >>
    >> --
    >> ___ Charlie Ruland ___ $BQG=q7E(B ___
    >> ERROR__COMMVNIS__FACIT__IVS
    >>
    >>
    >
    >=====
    >John H. Jenkins
    >jenkins@apple.com
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sun Jun 14 2009 - 21:24:03 CDT