From: Asmus Freytag (firstname.lastname@example.org)
Date: Sat Jul 24 2010 - 17:50:00 CDT
On 7/24/2010 3:00 PM, Bill Poser wrote:
> On Sat, Jul 24, 2010 at 1:00 PM, Michael Everson <email@example.com> wrote:
>> Digits can be scattered randomly about the code space and it wouldn't make any difference.
> Having written a library for performing conversions between Unicode
> strings and numbers, I disagree. While it is not all that hard to deal
> with the case in which the characters for the digits are scattered
> about the code space, if they occupy a contiguous set of code points
> in order of their value as they do, e.g., in ASCII, it simplifies both
> the conversion itself and such tasks as identifying the numeral system
> of a numeric string and checking the validity of a string as a number
> in a particular numeral system.
> It may well be that adopting such a policy is not realistic, but there
> would be advantages to it if were.
Michael is no programmer, hence he doesn't have first hand understanding why programmers distiguish between character set mapping (normally requiring look-up tables) and digit conversion (normally done by offset calculations).
That said, there are enough programmers on the committees so that scattered encoding of digits, while not prevented, is at least not the method of choice.
The problem with making this a policy is that some scripts may not have a decimal place-value type number system (or such use is not documented) at the time of their encoding. That means, a digit zero may not be known or documented.
However, a prudent encoding policy would be to leave a gap in that case, because there have been scripts for which use of a decimal place-value system was later discovered.
This archive was generated by hypermail 2.1.5 : Sat Jul 24 2010 - 17:53:43 CDT