Re: Unihan numeric support in Java?

From: Markus Scherer (markus.icu@gmail.com)
Date: Fri Feb 17 2006 - 12:31:28 CST

  • Next message: Antoine Leca: "Re: non-positional numerals in Unicode?"

    I assume it's an oversight for Java to not return the Han numeric
    values. They are Unicode property values for those characters. Due to
    the limited syntax of UnicodeData.txt, they are available only in
    Unihan.txt and, as Andrew said, in extracted/DerivedNumericValues.txt
    - the latter is small and easy to parse.

    ICU4J's UCharacter class returns the numeric values for Han
    characters. You could just use ICU instead of rolling your own.

    http://icu.sourceforge.net/apiref/icu4j/com/ibm/icu/lang/UCharacter.html#getUnicodeNumericValue(int)
    http://www-306.ibm.com/software/globalization/icu/downloads.jsp

    Best regards,
    markus

    On 2/17/06, Kit Peters <popefelix@gmail.com> wrote:
    > Well, *I'm* only interested in numbers, but the larger project that I'm
    > working within covers all of Unicode.
    >
    > On 2/17/06, Andrew West < andrewcwest@gmail.com> wrote:
    > > On 17/02/06, Kit Peters < popefelix@gmail.com> wrote:
    > > > 1) Is there a native Java way to retreive the numeric values for these
    > > > characters (i.e. a way that doesn't involve me parsing Unihan.txt)?
    > >
    > > If you're only interested in numbers, why not parse the following
    > > files directly, instead of UnicodeData.txt and Unihan.txt. They cover
    > > all characters defined as numbers by Unicode, including CJK
    > > ideographs.

    --
    Opinions expressed here may not reflect my company's positions unless
    otherwise noted.
    


    This archive was generated by hypermail 2.1.5 : Fri Feb 17 2006 - 12:40:08 CST