From: Markus Scherer ([email protected])
Date: Fri Feb 17 2006 - 12:31:28 CST
I assume it's an oversight for Java to not return the Han numeric
values. They are Unicode property values for those characters. Due to
the limited syntax of UnicodeData.txt, they are available only in
Unihan.txt and, as Andrew said, in extracted/DerivedNumericValues.txt
- the latter is small and easy to parse.
ICU4J's UCharacter class returns the numeric values for Han
characters. You could just use ICU instead of rolling your own.
http://icu.sourceforge.net/apiref/icu4j/com/ibm/icu/lang/UCharacter.html#getUnicodeNumericValue(int)
http://www-306.ibm.com/software/globalization/icu/downloads.jsp
Best regards,
markus
On 2/17/06, Kit Peters <[email protected]> wrote:
> Well, *I'm* only interested in numbers, but the larger project that I'm
> working within covers all of Unicode.
>
> On 2/17/06, Andrew West < [email protected]> wrote:
> > On 17/02/06, Kit Peters < [email protected]> wrote:
> > > 1) Is there a native Java way to retreive the numeric values for these
> > > characters (i.e. a way that doesn't involve me parsing Unihan.txt)?
> >
> > If you're only interested in numbers, why not parse the following
> > files directly, instead of UnicodeData.txt and Unihan.txt. They cover
> > all characters defined as numbers by Unicode, including CJK
> > ideographs.
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Fri Feb 17 2006 - 12:40:08 CST