From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Aug 15 2003 - 15:10:43 EDT
Jull Ramonsky asked:
> Thoughts anyone?
Well, yes...
> If the semantic difference between (for example) uppercase D and
> mathemematical bold uppercase D was considered sufficiently great so as to
> require a new codepoint, then I am tempted to wonder if the same might be
> considered true of hexadecimal digits.
No.
> So far as I can see, every
> single character in the "3AD29" string should be in general category N*
> (either Nd or Nl).
No. Doing so would trash other processing. And it would force the
disunification you are suggesting, which would have not actually
have the effect of helping anyone process these hexadecimal strings,
but would instead break all existing implementations of them.
> Sure, you can tell them apart by context, in most circumstances, in the same
> way that you can tell the difference between a hyphen and a minus sign by
> context, but since the meanings are so clearly distinct, I wonder if there
> is a case for distinguishing hex digits from letters without requiring
> context.
Sure there is a case for it, but not for breaking the existing
encoding to do so.
> I notice that there are Unicode properties "Hex_Digit" and "ASCII_Hex_Digit"
> which some Unicode characters possess. I may have missed it, but what I
> don't see in the charts is a mapping from characters having these property
> to the digit value that they represent.
There isn't. Any more than there is a chart showing all the numeric
values that Greek letters have, or all the numeric values that Hebrew
letters have, or all the numeric values that Runic letters have, ...
> Is it assumed that the number of
> characters having the "Hex_Digit" properties is so small that implementation
> is trivial?
Yes.
> That everyone knows it?
Yes.
> Or have I just missed the mapping by
> looking in the wrong place?
No.
Basically, thousands of implementations, for decades now,
have been using ASCII 0x30..0x39, 0x41..0x46, 0x61..0x66 to
implement hexadecimal numbers. That is also specified in
more than a few programming language standards and other
standards. Those characters map to Unicode U+0030..U+0039,
U+0041..U+0046, U+0061..U+0066.
Disrupting that would be a case of breaking something which
is working -- even if it would have been more ideal if the
Latin script and mathematics had had a hexadecimal digit
system in the first place and not had to borrow Latin letters
to express numbers with radix > 10.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Aug 15 2003 - 15:53:19 EDT