From: Jill.Ramonsky@Aculab.com
Date: Fri Aug 15 2003 - 10:11:07 EDT

• Next message: Peter Kirk: "Re: ZWJ/ZWNJ - Are they legal to use with combining marks?"

If the semantic difference between (for example) uppercase D and
mathemematical bold uppercase D was considered sufficiently great so as to
require a new codepoint, then I am tempted to wonder if the same might be

What I mean is, it seems (to me) that there is a HUGE semantic difference
between the hexadecimal digit thirteen, and the letter D. The former is a
numerical digit. The latter is a letter of the Basic Latin alphabet. The
symbol in the middle of the word "hides" is semantically a letter. It forms
part of a word. Whereas the symbol in the middle of the number "3AD29" (base
16) is semantically a digit, having the numerical value thirteen. It forms
part of a number (the number 240937, in fact). So far as I can see, every
single character in the "3AD29" string should be in general category N*
(either Nd or Nl).

Sure, you can tell them apart by context, in most circumstances, in the same
way that you can tell the difference between a hyphen and a minus sign by
context, but since the meanings are so clearly distinct, I wonder if there
is a case for distinguishing hex digits from letters without requiring
context.

A few years back, when I was programming in assembler, the particular
assembler I was using (can't remember which one, sorry) assumed that all
numbers were hexadecimal - a reasonable assumption, given what it did.
However - if the first digit was greater than nine, you had to supply a
leading zero, so that the assembler could distinguish it from an identifier.
If hex digits were characters distinct from letters, it wouldn't have needed
to make that rule.

I notice that there are Unicode properties "Hex_Digit" and "ASCII_Hex_Digit"
which some Unicode characters possess. I may have missed it, but what I
don't see in the charts is a mapping from characters having these property
to the digit value that they represent. Is it assumed that the number of
characters having the "Hex_Digit" properties is so small that implementation
is trivial? That everyone knows it? Or have I just missed the mapping by
looking in the wrong place?

And incidently, from a mathematician's point of view - or indeed a
programmer's point of view - there is really no semantic difference between
uppercase hex digit thirteen and lowercase hex digit thirteen, any more than
there is a semantic difference between uppercase hex digit three and
lowercase hex digit three. It is only because we re-use the letters of the
alphabet to fill this semantic void that the artifical distinction arises.
(I think the Romans had this problem. Unicode does provide upper and
lowercase variants of Roman numbers, but then again, all Roman numbers are
cased (apart from the really big ones) so maybe that's irrelevant).

Thoughts anyone?

Jill

This archive was generated by hypermail 2.1.5 : Fri Aug 15 2003 - 10:58:26 EDT