Re: writing Chinese dialects

From: vunzndi@vfemail.net
Date: Sun Feb 04 2007 - 09:53:57 CST

  • Next message: James Cloos: "Re: writing Chinese dialects"

    Arne

    Thank you, very intersting.

    For Extension B the best is Mr Taichi Kawabata's ids_irg.txt which
    includes all the cjkv characters presently in unicode at

    <http://www.cse.cuhk.edu.hk/~irg/irg/irg25/IRGN1183A_ids_irg.txt.gz>

    I usually just grep it, sometimes

         $ grep AB ids_irg.txt

    but more often the "fuzzy"

        $ grep A ids_irg.txt | grep B

    For, the very much smaller, and still to be fully passed Extension C,
    there is my "very much a work in progress" ExtensionC_decomposed.txt,
    which gives only the IRG numbers since the characters are not yet
    official. I hope to update this very soon. For this please goto
    http://east-chr-data.cvs.sourceforge.net/east-chr-data/ExtensionC/data/tables/ExtensionC_decomposed.txt?view=log and download the latest
    version.

      Accordiing to this at least 7 characters from your missing list are
    apparently in Extension C ( File attached).

    John Knightley

    Quoting "Arne (ʢ)" <arne@linux.org.tw>:

    > On Saturday 27 January 2007 13:35, John H. Jenkins wrote:
    >> I would love to see them, too, and will gladly add them to Unicode's
    >> database of known unencoded ideographs (provided we get reasonable
    >> pointers to documentation as well).
    >>
    >> Unfortunately, the ship has sailed on Extension D. Actual proposals
    >> to encode these will have to wait for Extension E.
    >
    > Ok, I have scanned the list.
    > The pdf is here:
    > http://debian.linux.org.tw/~arne/MinNan_IM/Minnan_missing001.pdf
    >
    > I also composed a list of all missing characters (the invented ones and
    > others from the same dictionary) with ideographic description
    > sequences.
    > The list is here:
    > http://debian.linux.org.tw/~arne/MinNan_IM/missing.txt
    >
    > At least I couldn't find those characters in Unicode... maybe I have
    > overlooked a few...
    >
    > which brings me to another question:
    > Does anyone have / know a tool where I can search CJK characters in
    > Unicode based on the components they are made of?
    > Im particularly intersted in Ext.B characters, because it's a PITA to
    > scan the PDF manually. The Radical/Stroke search on the Unicode webpage
    > is not always a big help, since it is not always clear to which radical
    > a character belongs, expecially in Ext.B... :(
    >
    > So, I'm looking for something like this:
    >
    > I want to get the codepoint of the character 3P.
    > I search for the components and . Then the character 3P should be
    > displayed with its codepoint U+23350.
    >
    > If this kind of database doesn't exist yet, who is with me to create
    > one?
    >
    > For the references of the above mentioned missing characters, I would
    > need some time to collect them... I guess a scan of the dictionary page
    > in question is not sufficient, is it?
    >
    > (I also have an additional list of missing charcaters from a Hakka
    > dictionary... but unfortunately I need to dig out the characters from
    > the dictionary myself, the author didn't provide me a list of them...
    > so it will take some time until the list is complete.)
    >
    > Cheers
    > Arne
    > --
    > Arne G

    -------------------------------------------------
    This message sent through Virus Free Email
    http://www.vfemail.net





    This archive was generated by hypermail 2.1.5 : Sun Feb 04 2007 - 09:56:34 CST