Re: Unihan numeric support in Java?

From: Markus Scherer
Date: Fri Feb 17 2006 - 12:31:28 CST

    I assume it's an oversight for Java to not return the Han numeric
    values. They are Unicode property values for those characters. Due to
    the limited syntax of UnicodeData.txt, they are available only in
    Unihan.txt and, as Andrew said, in extracted/DerivedNumericValues.txt
    - the latter is small and easy to parse.

    ICU4J's UCharacter class returns the numeric values for Han
    characters. You could just use ICU instead of rolling your own.

    Best regards,

    On 2/17/06, Kit Peters <> wrote:
    > Well, *I'm* only interested in numbers, but the larger project that I'm
    > working within covers all of Unicode.
    > On 2/17/06, Andrew West <> wrote:
    > > On 17/02/06, Kit Peters <> wrote:
    > > > 1) Is there a native Java way to retreive the numeric values for these
    > > > characters (i.e. a way that doesn't involve me parsing Unihan.txt)?
    > >
    > > If you're only interested in numbers, why not parse the following
    > > files directly, instead of UnicodeData.txt and Unihan.txt. They cover
    > > all characters defined as numbers by Unicode, including CJK
    > > ideographs.

