Re: Definitive list of Unicode digits

From: Jukka K. Korpela (
Date: Fri Jan 13 2006 - 15:11:20 CST

  • Next message: Asmus Freytag: "Re: Definitive list of Unicode digits"

    On Fri, 13 Jan 2006, Rick McGowan wrote:

    > Kit Peters asked,
    >> Can someone provide me a definitive list of all Unicode digits?
    > You can make one yourself. Download the files from the latest UCD and look
    > for "DIGIT". What you want, for starters, is probably the set of
    > everything that has a value in the "decimal digit" field of the
    > UnicodeData.txt file.

    However, the more general concept of digit covers some other characters
    too, such as superscript digits, which are counted as digits but may need
    special treatment. See
    Technically, you would consider the 8th field of each entry (line), and
    if it is nonempty, the character is a digit. (The field is labeled "(7)"
    in the UCD.html document, but that's because it does not count the first
    field, the Unicode number.)

    In Perl (assuming you have a local copy of UnicodeData.txt):

    $dbfile = 'UnicodeData.txt';
    open(DB,"<$dbfile") || die "Can't open database file $dfile $!";
    while(<DB>) {
       @entry = split(';',$_);
       if($entry[7]) {
           print $entry[0], " ", $entry[1], "\n"; }}

    (The results, when using the current database, are at )

    Depending on the programming environment, you might have a built-in
    function for determining whether a character is a digit. The function may
    or may not be up to date, i.e. correspond to the newest Unicode version.
    Beware, however, that the isDigit function in java.lang.Character
    tests for _decimal_ digits only (in the Unicode sense).

    Jukka "Yucca" Korpela,

    This archive was generated by hypermail 2.1.5 : Fri Jan 13 2006 - 15:13:38 CST