From: vunzndi@vfemail.net
Date: Sun Feb 04 2007 - 09:53:57 CST
Arne
Thank you, very intersting.
For Extension B the best is Mr Taichi Kawabata's ids_irg.txt which
includes all the cjkv characters presently in unicode at
<http://www.cse.cuhk.edu.hk/~irg/irg/irg25/IRGN1183A_ids_irg.txt.gz>
I usually just grep it, sometimes
$ grep AB ids_irg.txt
but more often the "fuzzy"
$ grep A ids_irg.txt | grep B
For, the very much smaller, and still to be fully passed Extension C,
there is my "very much a work in progress" ExtensionC_decomposed.txt,
which gives only the IRG numbers since the characters are not yet
official. I hope to update this very soon. For this please goto
http://east-chr-data.cvs.sourceforge.net/east-chr-data/ExtensionC/data/tables/ExtensionC_decomposed.txt?view=log and download the latest
version.
Accordiing to this at least 7 characters from your missing list are
apparently in Extension C ( File attached).
John Knightley
Quoting "Arne (ʢ)" <arne@linux.org.tw>:
> On Saturday 27 January 2007 13:35, John H. Jenkins wrote:
>> I would love to see them, too, and will gladly add them to Unicode's
>> database of known unencoded ideographs (provided we get reasonable
>> pointers to documentation as well).
>>
>> Unfortunately, the ship has sailed on Extension D. Actual proposals
>> to encode these will have to wait for Extension E.
>
> Ok, I have scanned the list.
> The pdf is here:
> http://debian.linux.org.tw/~arne/MinNan_IM/Minnan_missing001.pdf
>
> I also composed a list of all missing characters (the invented ones and
> others from the same dictionary) with ideographic description
> sequences.
> The list is here:
> http://debian.linux.org.tw/~arne/MinNan_IM/missing.txt
>
> At least I couldn't find those characters in Unicode... maybe I have
> overlooked a few...
>
> which brings me to another question:
> Does anyone have / know a tool where I can search CJK characters in
> Unicode based on the components they are made of?
> Im particularly intersted in Ext.B characters, because it's a PITA to
> scan the PDF manually. The Radical/Stroke search on the Unicode webpage
> is not always a big help, since it is not always clear to which radical
> a character belongs, expecially in Ext.B... :(
>
> So, I'm looking for something like this:
>
> I want to get the codepoint of the character 3P.
> I search for the components and . Then the character 3P should be
> displayed with its codepoint U+23350.
>
> If this kind of database doesn't exist yet, who is with me to create
> one?
>
> For the references of the above mentioned missing characters, I would
> need some time to collect them... I guess a scan of the dictionary page
> in question is not sufficient, is it?
>
> (I also have an additional list of missing charcaters from a Hakka
> dictionary... but unfortunately I need to dig out the characters from
> the dictionary myself, the author didn't provide me a list of them...
> so it will take some time until the list is complete.)
>
> Cheers
> Arne
> --
> Arne G
-------------------------------------------------
This message sent through Virus Free Email
http://www.vfemail.net
This archive was generated by hypermail 2.1.5 : Sun Feb 04 2007 - 09:56:34 CST