From: Jill.Ramonsky@Aculab.com
Date: Fri Aug 15 2003 - 10:11:07 EDT

  • Next message: Peter Kirk: "Re: ZWJ/ZWNJ - Are they legal to use with combining marks?"

    If the semantic difference between (for example) uppercase D and
    mathemematical bold uppercase D was considered sufficiently great so as to
    require a new codepoint, then I am tempted to wonder if the same might be
    considered true of hexadecimal digits.
    What I mean is, it seems (to me) that there is a HUGE semantic difference
    between the hexadecimal digit thirteen, and the letter D. The former is a
    numerical digit. The latter is a letter of the Basic Latin alphabet. The
    symbol in the middle of the word "hides" is semantically a letter. It forms
    part of a word. Whereas the symbol in the middle of the number "3AD29" (base
    16) is semantically a digit, having the numerical value thirteen. It forms
    part of a number (the number 240937, in fact). So far as I can see, every
    single character in the "3AD29" string should be in general category N*
    (either Nd or Nl).
    Sure, you can tell them apart by context, in most circumstances, in the same
    way that you can tell the difference between a hyphen and a minus sign by
    context, but since the meanings are so clearly distinct, I wonder if there
    is a case for distinguishing hex digits from letters without requiring
    A few years back, when I was programming in assembler, the particular
    assembler I was using (can't remember which one, sorry) assumed that all
    numbers were hexadecimal - a reasonable assumption, given what it did.
    However - if the first digit was greater than nine, you had to supply a
    leading zero, so that the assembler could distinguish it from an identifier.
    If hex digits were characters distinct from letters, it wouldn't have needed
    to make that rule.
    I notice that there are Unicode properties "Hex_Digit" and "ASCII_Hex_Digit"
    which some Unicode characters possess. I may have missed it, but what I
    don't see in the charts is a mapping from characters having these property
    to the digit value that they represent. Is it assumed that the number of
    characters having the "Hex_Digit" properties is so small that implementation
    is trivial? That everyone knows it? Or have I just missed the mapping by
    looking in the wrong place?
    And incidently, from a mathematician's point of view - or indeed a
    programmer's point of view - there is really no semantic difference between
    uppercase hex digit thirteen and lowercase hex digit thirteen, any more than
    there is a semantic difference between uppercase hex digit three and
    lowercase hex digit three. It is only because we re-use the letters of the
    alphabet to fill this semantic void that the artifical distinction arises.
    (I think the Romans had this problem. Unicode does provide upper and
    lowercase variants of Roman numbers, but then again, all Roman numbers are
    cased (apart from the really big ones) so maybe that's irrelevant).
    Thoughts anyone?

    This archive was generated by hypermail 2.1.5 : Fri Aug 15 2003 - 10:58:26 EDT