Re: Common IME Data Resources (Was: Mobile phones and Unicode support)

From: vunzndi@vfemail.net
Date: Fri Sep 05 2008 - 13:47:01 CDT

Next message: Dreiheller, Albrecht: "CE Mark"

Previous message: Ed Trager: "Re: Common IME Data Resources (Was: Mobile phones and Unicode support)"
In reply to: Ed Trager: "Re: Common IME Data Resources (Was: Mobile phones and Unicode support)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Quoting Ed Trager <ed.trager@gmail.com>:

> Hi, everyone,
>
>> Essentially IME's consist of several parts - a table of input keys which
>> maps a list of one or more character strings, for opensource projects these
>> are potential excellent common resources - including the often
>> underestimated resource of an input method understood by many.
>
> This is the primary type of common resource I was talking about. For
> methods such as the "Smart Pinyin" input method for Mandarin, the data
> table is quite large. And it is not a static table. The table can and
> should continue to grow as new words and phrases are coined or gain in
> currency or popularity.
>

Yes the amount of data used now is indeed large - this is what makes
present IMEs easier to use than the earlier versions, and why having
common data is useful.

> The English word "chrome" now has a "new" meaning with the release of
> Google's browser just a few days ago. In the future a localized name
> for the Chrome browser may become popular in Chinese (currently it
> seems to just be called "谷歌浏览器"--
> "Google Browser").
>
> If there were a centralized project for managing IME shared data
> resources, new words could be added almost as quickly as they are
> coined. For a method like "Smart Pinyin", one can imagine an IME
> engine smart enough to periodically fetch updates via the internet,
> maybe even on-the-fly AJAX-style.
>

Most firefox translation plugins use an online dictionary.

>> course the font required displaying which can be very device dependent. The
>> final part is the IME software itself that presents the candidates in some
>> form for election it is this which main of the above projects differ, and in
>> some respects the hardest to integrate let alone have common resources for.
>
> For the more complex IMEs for languages like Chinese and Japanese, it
> is nevertheless not difficult to imagine packaging the core algorithms
> for selecting candidates in redistributable Open Source code
> libraries.
>

I would be interested to here what algorithms you think might be
suitable for such treatment.

>>
>> Speaking of different methods does anyone know about the copyright status of
>> the wubi input method. What is copyrighted, and what is copyrightable. In
>> particular if a IME uses a table containing keys the same as wubi but not
>> called wubi would this be acceptable?
>>
>
> A couple of years ago I myself had asked a similar question. As I
> recall, according to U.S. copyright law, a telephone book producer can
> copyright only their particular presentation of the data (i.e., in a
> bound printed form, for example), but not the data itself.
>

There is of course patent law as well that might be applied. Of course
what Chinese law says on this is important. wubi is of interest to me
because it is a widely used shape based IME.
　　

John

-------------------------------------------------
This message sent through Virus Free Email
Only $14.95 for a 500Mb lifetime account!
http://www.vfemail.net

Next message: Dreiheller, Albrecht: "CE Mark"
Previous message: Ed Trager: "Re: Common IME Data Resources (Was: Mobile phones and Unicode support)"
In reply to: Ed Trager: "Re: Common IME Data Resources (Was: Mobile phones and Unicode support)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Sep 05 2008 - 13:53:55 CDT