Re: writing Chinese dialects

Date: Sun Feb 04 2007 - 18:09:19 CST

    Dear Arne,

    I would certianly welcome help putting the data into standard ids
    format. The file is exported from a database of mine that uses a
    format similar to ids ( close enough for a fuzzy search as described
    below) . I do have a more recent version which I think is too big for
    the mailing and so I will send it to you seperately . Briefly the
    ideas are
         1. ? and ?? missing or uncertain character/data (similar to
    the ids_irg.txt where ? usually denotes a missing character)
        2. + , - and brackets with obvious usage
       3. A+B combinations as opposed to Mr Taichi Kawabata's reverse
    polish +AB ordering
      4. A-B premited where the part/radical is not in unicode

    It would be fair to say that only the 4th option allowing A-B, is
    particularly useful, in other respects Mr Taichi Kawabata's system is
    much better for doing sophiticated searches where ids are flattend,
    that is broken down into parts before searching.

    A straight subsitution, leaves the orders incorrect, I therefore left
    the data in with it's +,- and brackets so that it would be obvious
    that there was a difference. I was planning to reorder after do on
    last check of the data.


    Quoting "Arne<>:

    > On Sunday 04 February 2007 23:53, wrote:
    >> For Extension B the best is Mr Taichi Kawabata's ids_irg.txt which
    >> includes all the cjkv characters presently in unicode at
    >> <>
    >> I usually just grep it, sometimes
    >> $ grep AB ids_irg.txt
    >> but more often the "fuzzy"
    >> $ grep A ids_irg.txt | grep B
    >> For, the very much smaller, and still to be fully passed Extension C,
    >> there is my "very much a work in progress"
    >> ExtensionC_decomposed.txt, which gives only the IRG numbers since the
    >> characters are not yet official. I hope to update this very soon. For
    >> this please goto
    >> a/tables/ExtensionC_decomposed.txt?view=log and download the latest
    >> version.
    >> Accordiing to this at least 7 characters from your missing list are
    >> apparently in Extension C ( File attached).
    >> John Knightley
    > Thanks very much, both of you. I think this will help a lot for finding
    > more "missing" characters... :)
    > John, may I help you to update your Ext. C file to use the "correct" IDS
    > instead of "/" and "+" ? ;) I would send you a diff then...
    > Cheers
    > Arne
    --
    > Arne G

