Re: Hexadecimal

From: Jim Allan (jallan@smrtytrek.com)
Date: Fri Aug 15 2003 - 16:12:36 EDT

  • Next message: Rick McGowan: "Public Review Issues, reminder"

    Jill Ramonsky posted:

    > What I mean is, it seems (to me) that there is a HUGE semantic difference
    > between the hexadecimal digit thirteen, and the letter D.

    Yes.

    There is also a HUGE semantic difference between D meaning the letter D
    and Roman numeral D meaning 500.

    But see http://www.unicode.org/versions/Unicode4.0.0/ch14.pdf:

    << *Roman Numerals.* The Roman numerals can be composed of sequences of
    the appropriate Latin letters. Upper- and lowercase variants of the
    Roman numerals through 12, plus L, C, D, and M, have been encoded for
    compatibility with East Asian standards. >>

    When the Unicode manual begins to talk anything being encoded for
    compatibility it usually means that it was *only* encoded for
    compatibility and otherwise would probably not have been encoded at all
    in Unicode because it is not needed.

    Note that the chart at http://www.unicode.org/charts/PDF/U2150.pdf
    indictes compatibility decomposition of these characters to the regular
    Latin letters.

    The letter _d_, though here lowercase, is also the symbol for _deci-_ in
    metric abbrevations. See
    http://www.geocities.com/Athens/Thebes/5118/metric.htm.

    _D_ also often means "digital" as in _D/A_ "digital to analog" or
    _D-AMPS_ "Digital Advanced Mobile Phone System".

    _D_ is listed at
    http://www.geocities.com/malaysiastamp/info/abbreviationd.html as
    meaning both "document" and "Pneumatic Post. Scott catalog number prefix
    to identify stamps other than standard postage".

    If Unicode even distinguished some of these uses (and similar special
    uses for all letters in all scripts) by encoding them separately in
    Unicode, what purpose would be served? The viewers would still only see
    _D_ or _d_ as indeed they ought to, since that is what they should see
    according to normal orthography and spelling.

    Most users would not enter the new proper characters in any case. Even
    now most fonts don't support the special Roman numeral characters, and
    there is no need to support them. The standard Roman letter glyphs are
    what are normally used.

    Unicode doesn't attempt to distinguish meanings of symbols except when
    forced to by compatibility with older character sets or in a few cases
    where the same character in appearance is used sometimes as a "letter"
    and sometimes as "punctuation" so that applications can determine the
    proper beginnings and endings of words.

    The semantics of the symbols is otherwise not Unicode's concern. Unicode
    should not define whether 302D is a hex number or a product identifier
    or a section identifier in a document or perhaps has some other meaning.
    Encoding "D" with a dfferent code won't help a reader of printed text
    (or even displayed text) to know what is meant. A copy typist may not
    know what is meant.

    > I notice that there are Unicode properties "Hex_Digit" and "ASCII_Hex_Digit"
    > which some Unicode characters possess. I may have missed it, but what I
    > don't see in the charts is a mapping from characters having these property
    > to the digit value that they represent. Is it assumed that the number of
    > characters having the "Hex_Digit" properties is so small that implementation
    > is trivial? That everyone knows it? Or have I just missed the mapping by
    > looking in the wrong place?

    See http://www.unicode.org/Public/UNIDATA/PropList.txt:

    <<
    0030..0039 ; ASCII_Hex_Digit # Nd [10] DIGIT ZERO..DIGIT NINE
    0041..0046 ; ASCII_Hex_Digit # L& [6] LATIN CAPITAL LETTER A..LATIN
    CAPITAL LETTER F
    0061..0066 ; ASCII_Hex_Digit # L& [6] LATIN SMALL LETTER A..LATIN
    SMALL LETTER F

    # Total code points: 22
    >>

    The property ASCII_Hex_Digit is a convenience to allow applications to
    identify one common use of "A", "B", "C", "D", "E" and "F" in accordance
    with defined properites set out in some programming langauges.

    In fact it has also become common when using bases greater than 16 to
    extend this convention so that one can have such a number as AW3Zā‚ƒā‚† in
    base-36 notation.

    To indicate hex numbers a subscripted base indicator or a leading "&H"
    or the word "hex" or some other indicator of meaning is far more useful
    to humans than a double encoding of the same characters according to
    meaning.

    If you can't normally see the difference in text then Unicode normally
    shouldn't encode any difference.

    Jim Allan



    This archive was generated by hypermail 2.1.5 : Fri Aug 15 2003 - 16:52:18 EDT