Fwd: Unihan SQL access

From: Uriah Eisenstein (uriaheisenstein@gmail.com)
Date: Thu Sep 30 2010 - 13:48:58 CDT


As usual this took longer than I thought... But an initial version is
finally ready, and can be found in
http://babelfish.50webs.com/unihan-sql-browser/Unihan%20SQL%20Browser.html.
It requires access to the Unihan.zip file and a JDBC driver; there are
explanations on the web page which I hope would be enough. Quite a few
improvements are already planned... I'd be glad to hear anyone finds it
useful.

While at it, I found a couple of apparent typos in the source indications
of variants (using SELECT DISTINCT SOURCE FROM VARIANT_SOURCE). These all
come from the kSemanticVariant field:

SELECT * FROM kSemanticVariant_source
WHERE kSemanticVariant_source IN ('kMathews', 'kMeterWempe')

[U+3C92] 勽 [U+52FD] kMathews
勽 [U+52FD] [U+3C92] kMathews
[U+25500] 渹 [U+6E39] kMeterWempe

Regards,
Uriah Eisenstein

---------- Forwarded message ----------
From: Uriah Eisenstein <uriaheisenstein@gmail.com>
Date: Sun, Sep 12, 2010 at 5:57 PM
Subject: Unihan SQL access
To: unicode List <unicode@unicode.org>

Hello,
I'm nearing completion of a simple Java program which loads Unihan data from
the source files into a DB, and provides SQL access to it.There's still at
least a week or so of work on issues I consider essential, but once ready
I'd be happy to make it available on the Internet if anyone's interested.
So far I've used it to search for possibly erroneous data in Unihan; my
latest find is that 73 characters have a kTaiwanTelegraph value of 0000,
which seems doubtful. It can also be useful for various statistical
information such as how many characters are listed under each radical, or
which blocks include IICore characters.
I'm also considering adding the contents of the Unicode Character Database
as well at a later phase.
Regards,
Uriah Eisenstein



This archive was generated by hypermail 2.1.5 : Thu Sep 30 2010 - 13:54:26 CDT