Re: Common IME Data Resources (Was: Mobile phones and Unicode support)

From: Ed Trager (
Date: Fri Sep 05 2008 - 10:10:31 CDT

  • Next message: "Re: Common IME Data Resources (Was: Mobile phones and Unicode support)"

    Hi, everyone,

    > Essentially IME's consist of several parts - a table of input keys which
    > maps a list of one or more character strings, for opensource projects these
    > are potential excellent common resources - including the often
    > underestimated resource of an input method understood by many.

    This is the primary type of common resource I was talking about. For
    methods such as the "Smart Pinyin" input method for Mandarin, the data
    table is quite large. And it is not a static table. The table can and
    should continue to grow as new words and phrases are coined or gain in
    currency or popularity.

    The English word "chrome" now has a "new" meaning with the release of
    Google's browser just a few days ago. In the future a localized name
    for the Chrome browser may become popular in Chinese (currently it
    seems to just be called "谷歌浏览器" -- "Google Browser").

    If there were a centralized project for managing IME shared data
    resources, new words could be added almost as quickly as they are
    coined. For a method like "Smart Pinyin", one can imagine an IME
    engine smart enough to periodically fetch updates via the internet,
    maybe even on-the-fly AJAX-style.

    > course the font required displaying which can be very device dependent. The
    > final part is the IME software itself that presents the candidates in some
    > form for election it is this which main of the above projects differ, and in
    > some respects the hardest to integrate let alone have common resources for.

    For the more complex IMEs for languages like Chinese and Japanese, it
    is nevertheless not difficult to imagine packaging the core algorithms
    for selecting candidates in redistributable Open Source code

    > Speaking of different methods does anyone know about the copyright status of
    > the wubi input method. What is copyrighted, and what is copyrightable. In
    > particular if a IME uses a table containing keys the same as wubi but not
    > called wubi would this be acceptable?

    A couple of years ago I myself had asked a similar question. As I
    recall, according to U.S. copyright law, a telephone book producer can
    copyright only their particular presentation of the data (i.e., in a
    bound printed form, for example), but not the data itself.

    >>> Increasing diversification in the mobile devices market is already
    >>> leading
    >>> to the creation of projects like Android. It seems to me one could also
    >>> create a FOSS project around IMEs that Android and other devices,
    >>> including
    >>> desktop computers, could all use.
    > A good idea if you can get it to work.
    > John Knightley
    >> Mmm, maybe a first start would be a technical paper that points out all
    >> that
    >> an IME ought to implement/expose in order to work towards a standard kind
    >> of
    >> framework. Not sure how well that would work though.
    >>> I certainly know which CJK input methods I like best, but they are
    >>> invariably not available on most computers and devices I have access
    >>> to.
    >> Being able to have a romaja Korean IME on Unix (through uim/m17n) is very
    >> nice and it's a drag to try and remember the keyboard layout on Windows
    >> for
    >> Korean, for example.
    >> --
    >> Jeroen Ruigrok van der Werven <asmodai(-at-)> / asmodai
    >> ????? ?????? ??? ?? ??????
    >> | | GPG: 2EAC625B
    >> To do injustice is more disgraceful than to suffer it...

    This archive was generated by hypermail 2.1.5 : Fri Sep 05 2008 - 10:15:40 CDT