Re: Unihan numeric support in Java?

From: Markus Scherer (markus.icu@gmail.com)
Date: Fri Feb 17 2006 - 12:31:28 CST

Next message: Antoine Leca: "Re: non-positional numerals in Unicode?"

Previous message: Mount, Rob (Robert F): "Re: exponential/scientific notation in non-Western character"
In reply to: Kit Peters: "Re: Unihan numeric support in Java?"
Next in thread: Kit Peters: "Re: Unihan numeric support in Java?"
Reply: Kit Peters: "Re: Unihan numeric support in Java?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I assume it's an oversight for Java to not return the Han numeric
values. They are Unicode property values for those characters. Due to
the limited syntax of UnicodeData.txt, they are available only in
Unihan.txt and, as Andrew said, in extracted/DerivedNumericValues.txt
- the latter is small and easy to parse.

ICU4J's UCharacter class returns the numeric values for Han
characters. You could just use ICU instead of rolling your own.

http://icu.sourceforge.net/apiref/icu4j/com/ibm/icu/lang/UCharacter.html#getUnicodeNumericValue(int)
http://www-306.ibm.com/software/globalization/icu/downloads.jsp

Best regards,
markus

On 2/17/06, Kit Peters <popefelix@gmail.com> wrote:
> Well, *I'm* only interested in numbers, but the larger project that I'm
> working within covers all of Unicode.
>
> On 2/17/06, Andrew West < andrewcwest@gmail.com> wrote:
> > On 17/02/06, Kit Peters < popefelix@gmail.com> wrote:
> > > 1) Is there a native Java way to retreive the numeric values for these
> > > characters (i.e. a way that doesn't involve me parsing Unihan.txt)?
> >
> > If you're only interested in numbers, why not parse the following
> > files directly, instead of UnicodeData.txt and Unihan.txt. They cover
> > all characters defined as numbers by Unicode, including CJK
> > ideographs.

--
Opinions expressed here may not reflect my company's positions unless
otherwise noted.

Next message: Antoine Leca: "Re: non-positional numerals in Unicode?"
Previous message: Mount, Rob (Robert F): "Re: exponential/scientific notation in non-Western character"
In reply to: Kit Peters: "Re: Unihan numeric support in Java?"
Next in thread: Kit Peters: "Re: Unihan numeric support in Java?"
Reply: Kit Peters: "Re: Unihan numeric support in Java?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Feb 17 2006 - 12:40:08 CST