> - must (should) distinguish between Japanese digits and western digits,
> and disallow the user to write a number by mixing the two sets. But all
> number-related symbols are market as "is digit", without distincion
> about "is digit, more precisely, a japanese digit"... so a routine that
> processes numeric input needs to check the symbols read to figure out if
> the number is being written in jap or western or whatsoever, and then
> apply the rules (e.g. 1 million is 100 x 10000 in Japanese
> ['hyaku-man']), so to assemble what the user types in and to get the
> internal binary representation of the number there is a lot of decisions
> to be taken and there is little support built into the Unicode set for
> taking these decisions.
> It seems that the routine dealing with number input has to know by
> itself a lot of stuff that could be (at least in part) conveniently
> placed in the information tags of the Unicode format.
The Unicode Character Database (UnicodeData-Latest.txt) does not mark
*any* of the Han characters as "is digit". The table of numeric
properties in Chapter 4 of the Unicode Standard carefully distinguishes
between characters which are used as decimal radix digits and other
characters which have numerical values. This information is also
present in the Unicode Character Database, where it is available for
use in Unicode implementations.
For the publication of the Unicode Standard, Version 3.0, we are planning
to add a table of ideographic character numeric values, including
values for the many characters used for fraud-proof numeric representation
(e.g. for printing on checks).
The actual software required for formatting and parsing numeric
data is outside the context of the Unicode Standard, of course, but
we hope that by providing standard tables of numeric values,
implementers will have a sounder basis for developing such numeric
--Ken Whistler, Technical Director, Unicode, Inc.
> I'd like to know your comments on the topic. Perhaps I'm wrong.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT