Re: Definitive list of Unicode digits

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Fri Jan 13 2006 - 15:11:20 CST

  • Next message: Asmus Freytag: "Re: Definitive list of Unicode digits"

    On Fri, 13 Jan 2006, Rick McGowan wrote:

    > Kit Peters asked,
    >
    >> Can someone provide me a definitive list of all Unicode digits?
    >
    > You can make one yourself. Download the files from the latest UCD and look
    > for "DIGIT". What you want, for starters, is probably the set of
    > everything that has a value in the "decimal digit" field of the
    > UnicodeData.txt file.

    However, the more general concept of digit covers some other characters
    too, such as superscript digits, which are counted as digits but may need
    special treatment. See
    http://www.unicode.org/Public/UNIDATA/UCD.html#Numeric_Type
    Technically, you would consider the 8th field of each entry (line), and
    if it is nonempty, the character is a digit. (The field is labeled "(7)"
    in the UCD.html document, but that's because it does not count the first
    field, the Unicode number.)

    In Perl (assuming you have a local copy of UnicodeData.txt):

    $dbfile = 'UnicodeData.txt';
    open(DB,"<$dbfile") || die "Can't open database file $dfile $!";
    while(<DB>) {
       @entry = split(';',$_);
       if($entry[7]) {
           print $entry[0], " ", $entry[1], "\n"; }}

    (The results, when using the current database, are at
    http://www.cs.tut.fi/~jkorpela/unicode/digits.txt )

    Depending on the programming environment, you might have a built-in
    function for determining whether a character is a digit. The function may
    or may not be up to date, i.e. correspond to the newest Unicode version.
    Beware, however, that the isDigit function in java.lang.Character
    tests for _decimal_ digits only (in the Unicode sense).

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Fri Jan 13 2006 - 15:13:38 CST