Re: Common IME Data Resources (Was: Mobile phones and Unicode support)

Date: Fri Sep 05 2008 - 13:47:01 CDT

  • Next message: Dreiheller, Albrecht: "CE Mark"

    Quoting Ed Trager <>:

    > Hi, everyone,
    >> Essentially IME's consist of several parts - a table of input keys which
    >> maps a list of one or more character strings, for opensource projects these
    >> are potential excellent common resources - including the often
    >> underestimated resource of an input method understood by many.
    > This is the primary type of common resource I was talking about. For
    > methods such as the "Smart Pinyin" input method for Mandarin, the data
    > table is quite large. And it is not a static table. The table can and
    > should continue to grow as new words and phrases are coined or gain in
    > currency or popularity.

    Yes the amount of data used now is indeed large - this is what makes
    present IMEs easier to use than the earlier versions, and why having
    common data is useful.

    > The English word "chrome" now has a "new" meaning with the release of
    > Google's browser just a few days ago. In the future a localized name
    > for the Chrome browser may become popular in Chinese (currently it
    > seems to just be called "&#35895;&#27468;&#27983;&#35272;&#22120;"--
    > "Google Browser").
    > If there were a centralized project for managing IME shared data
    > resources, new words could be added almost as quickly as they are
    > coined. For a method like "Smart Pinyin", one can imagine an IME
    > engine smart enough to periodically fetch updates via the internet,
    > maybe even on-the-fly AJAX-style.

    Most firefox translation plugins use an online dictionary.

    >> course the font required displaying which can be very device dependent. The
    >> final part is the IME software itself that presents the candidates in some
    >> form for election it is this which main of the above projects differ, and in
    >> some respects the hardest to integrate let alone have common resources for.
    > For the more complex IMEs for languages like Chinese and Japanese, it
    > is nevertheless not difficult to imagine packaging the core algorithms
    > for selecting candidates in redistributable Open Source code
    > libraries.

    I would be interested to here what algorithms you think might be
    suitable for such treatment.

    >> Speaking of different methods does anyone know about the copyright status of
    >> the wubi input method. What is copyrighted, and what is copyrightable. In
    >> particular if a IME uses a table containing keys the same as wubi but not
    >> called wubi would this be acceptable?
    > A couple of years ago I myself had asked a similar question. As I
    > recall, according to U.S. copyright law, a telephone book producer can
    > copyright only their particular presentation of the data (i.e., in a
    > bound printed form, for example), but not the data itself.

    There is of course patent law as well that might be applied. Of course
    what Chinese law says on this is important. wubi is of interest to me
    because it is a widely used shape based IME.


    This message sent through Virus Free Email
    Only $14.95 for a 500Mb lifetime account!

    This archive was generated by hypermail 2.1.5 : Fri Sep 05 2008 - 13:53:55 CDT