Re: Unihan number types and values

From: Kent Karlsson (kent.karlsson14@telia.com)
Date: Tue Nov 30 2010 - 07:36:02 CST

  • Next message: Mahesh T. Pai: "Re: Phishing and enforcing Confusables.txt"

    Den 2010-11-29 23:24, skrev "Kenneth Whistler" <kenw@sybase.com>:

    ...
    > they are quite often used in traditional numbering in
    > East Asia, which does not use decimal radix forms. Handling
    > Han numeric ideographs requires special processing to
    > parse numeric values correctly.

    CLDR, and ICU, has (some) support for that. See
    http://www.unicode.org/cldr/trac/browser/trunk/common/rbnf/zh_Hant.xml
    http://www.unicode.org/cldr/trac/browser/trunk/common/rbnf/zh.xml
    http://www.unicode.org/cldr/trac/browser/trunk/common/rbnf/ja.xml

    The data in these datafiles are used by the RBNF number formatter
    and reader APIs in ICU:
    http://icu-project.org/apiref/icu4c/classRuleBasedNumberFormat.html.
    (None of them permit substituting "numerically equivalent" Han characters
    for reading.)

    More on numbering systems in CLDR: see
    http://www.unicode.org/cldr/trac/browser/trunk/common/supplemental/numbering
    Systems.xml. One, just one (for now at least), decimal-base position system
    using Han characters is supported, called "hanidec". The names listed in
    numberingSystems.xml can be used in the ICU API to ask for the numbering
    system in question. (Some of the number spellout systems, including the
    Han character ones, can be asked for that way; but most cannot, and one must
    then use the RBNF API directly.)

        /Kent K



    This archive was generated by hypermail 2.1.5 : Tue Nov 30 2010 - 07:41:20 CST