Re: writing Chinese dialects

From: vunzndi@vfemail.net
Date: Sun Feb 04 2007 - 18:09:19 CST

Next message: Otto Stolz: "Re: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts"

Previous message: vunzndi@vfemail.net: "Re: writing Chinese dialects"
In reply to: Arne Götje (高盛華): "Re: writing Chinese dialects"
Next in thread: Philippe Verdy: "Re: writing Chinese dialects"
Reply: Philippe Verdy: "Re: writing Chinese dialects"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Dear Arne,

I would certianly welcome help putting the data into standard ids
format. The file is exported from a database of mine that uses a
format similar to ids ( close enough for a fuzzy search as described
below) . I do have a more recent version which I think is too big for
the mailing and so I will send it to you seperately . Briefly the
ideas are
     1. ? and ?? missing or uncertain character/data (similar to
the ids_irg.txt where ? usually denotes a missing character)
    2. + , - and brackets with obvious usage
   3. A+B combinations as opposed to Mr Taichi Kawabata's reverse
polish +AB ordering
  4. A-B premited where the part/radical is not in unicode

It would be fair to say that only the 4th option allowing A-B, is
particularly useful, in other respects Mr Taichi Kawabata's system is
much better for doing sophiticated searches where ids are flattend,
that is broken down into parts before searching.

A straight subsitution, leaves the orders incorrect, I therefore left
the data in with it's +,- and brackets so that it would be obvious
that there was a difference. I was planning to reorder after do on
last check of the data.

John

Quoting "Arne<arne@linux.org.tw>:

> On Sunday 04 February 2007 23:53, vunzndi@vfemail.net wrote:
>> For Extension B the best is Mr Taichi Kawabata's ids_irg.txt which
>> includes all the cjkv characters presently in unicode at
>>
>> <http://www.cse.cuhk.edu.hk/~irg/irg/irg25/IRGN1183A_ids_irg.txt.gz>
>>
>> I usually just grep it, sometimes
>>
>> $ grep AB ids_irg.txt
>>
>> but more often the "fuzzy"
>>
>> $ grep A ids_irg.txt | grep B
>>
>>
>> For, the very much smaller, and still to be fully passed Extension C,
>> there is my "very much a work in progress"
>> ExtensionC_decomposed.txt, which gives only the IRG numbers since the
>> characters are not yet official. I hope to update this very soon. For
>> this please goto
>> http://east-chr-data.cvs.sourceforge.net/east-chr-data/ExtensionC/dat
>> a/tables/ExtensionC_decomposed.txt?view=log and download the latest
>> version.
>>
>> Accordiing to this at least 7 characters from your missing list are
>> apparently in Extension C ( File attached).
>>
>> John Knightley
>
> Thanks very much, both of you. I think this will help a lot for finding
> more "missing" characters... :)
>
> John, may I help you to update your Ext. C file to use the "correct" IDS
> instead of "/" and "+" ? ;) I would send you a diff then...
>
> Cheers
> Arne
> --
> Arne G

-------------------------------------------------
This message sent through Virus Free Email
http://www.vfemail.net

Next message: Otto Stolz: "Re: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts"
Previous message: vunzndi@vfemail.net: "Re: writing Chinese dialects"
In reply to: Arne Götje (高盛華): "Re: writing Chinese dialects"
Next in thread: Philippe Verdy: "Re: writing Chinese dialects"
Reply: Philippe Verdy: "Re: writing Chinese dialects"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Feb 04 2007 - 18:11:02 CST