From: Jukka K. Korpela (email@example.com)
Date: Fri Jan 13 2006 - 15:11:20 CST
On Fri, 13 Jan 2006, Rick McGowan wrote:
> Kit Peters asked,
>> Can someone provide me a definitive list of all Unicode digits?
> You can make one yourself. Download the files from the latest UCD and look
> for "DIGIT". What you want, for starters, is probably the set of
> everything that has a value in the "decimal digit" field of the
> UnicodeData.txt file.
However, the more general concept of digit covers some other characters
too, such as superscript digits, which are counted as digits but may need
special treatment. See
Technically, you would consider the 8th field of each entry (line), and
if it is nonempty, the character is a digit. (The field is labeled "(7)"
in the UCD.html document, but that's because it does not count the first
field, the Unicode number.)
In Perl (assuming you have a local copy of UnicodeData.txt):
$dbfile = 'UnicodeData.txt';
open(DB,"<$dbfile") || die "Can't open database file $dfile $!";
@entry = split(';',$_);
print $entry, " ", $entry, "\n"; }}
(The results, when using the current database, are at
Depending on the programming environment, you might have a built-in
function for determining whether a character is a digit. The function may
or may not be up to date, i.e. correspond to the newest Unicode version.
Beware, however, that the isDigit function in java.lang.Character
tests for _decimal_ digits only (in the Unicode sense).
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Fri Jan 13 2006 - 15:13:38 CST