From: Jim Allan (firstname.lastname@example.org)
Date: Fri Aug 15 2003 - 16:12:36 EDT
Jill Ramonsky posted:
> What I mean is, it seems (to me) that there is a HUGE semantic difference
> between the hexadecimal digit thirteen, and the letter D.
There is also a HUGE semantic difference between D meaning the letter D
and Roman numeral D meaning 500.
But see http://www.unicode.org/versions/Unicode4.0.0/ch14.pdf:
<< *Roman Numerals.* The Roman numerals can be composed of sequences of
the appropriate Latin letters. Upper- and lowercase variants of the
Roman numerals through 12, plus L, C, D, and M, have been encoded for
compatibility with East Asian standards. >>
When the Unicode manual begins to talk anything being encoded for
compatibility it usually means that it was *only* encoded for
compatibility and otherwise would probably not have been encoded at all
in Unicode because it is not needed.
Note that the chart at http://www.unicode.org/charts/PDF/U2150.pdf
indictes compatibility decomposition of these characters to the regular
The letter _d_, though here lowercase, is also the symbol for _deci-_ in
metric abbrevations. See
_D_ also often means "digital" as in _D/A_ "digital to analog" or
_D-AMPS_ "Digital Advanced Mobile Phone System".
_D_ is listed at
meaning both "document" and "Pneumatic Post. Scott catalog number prefix
to identify stamps other than standard postage".
If Unicode even distinguished some of these uses (and similar special
uses for all letters in all scripts) by encoding them separately in
Unicode, what purpose would be served? The viewers would still only see
_D_ or _d_ as indeed they ought to, since that is what they should see
according to normal orthography and spelling.
Most users would not enter the new proper characters in any case. Even
now most fonts don't support the special Roman numeral characters, and
there is no need to support them. The standard Roman letter glyphs are
what are normally used.
Unicode doesn't attempt to distinguish meanings of symbols except when
forced to by compatibility with older character sets or in a few cases
where the same character in appearance is used sometimes as a "letter"
and sometimes as "punctuation" so that applications can determine the
proper beginnings and endings of words.
The semantics of the symbols is otherwise not Unicode's concern. Unicode
should not define whether 302D is a hex number or a product identifier
or a section identifier in a document or perhaps has some other meaning.
Encoding "D" with a dfferent code won't help a reader of printed text
(or even displayed text) to know what is meant. A copy typist may not
know what is meant.
> I notice that there are Unicode properties "Hex_Digit" and "ASCII_Hex_Digit"
> which some Unicode characters possess. I may have missed it, but what I
> don't see in the charts is a mapping from characters having these property
> to the digit value that they represent. Is it assumed that the number of
> characters having the "Hex_Digit" properties is so small that implementation
> is trivial? That everyone knows it? Or have I just missed the mapping by
> looking in the wrong place?
0030..0039 ; ASCII_Hex_Digit # Nd  DIGIT ZERO..DIGIT NINE
0041..0046 ; ASCII_Hex_Digit # L&  LATIN CAPITAL LETTER A..LATIN
CAPITAL LETTER F
0061..0066 ; ASCII_Hex_Digit # L&  LATIN SMALL LETTER A..LATIN
SMALL LETTER F
# Total code points: 22
The property ASCII_Hex_Digit is a convenience to allow applications to
identify one common use of "A", "B", "C", "D", "E" and "F" in accordance
with defined properites set out in some programming langauges.
In fact it has also become common when using bases greater than 16 to
extend this convention so that one can have such a number as AW3Z₃₆ in
To indicate hex numbers a subscripted base indicator or a leading "&H"
or the word "hex" or some other indicator of meaning is far more useful
to humans than a double encoding of the same characters according to
If you can't normally see the difference in text then Unicode normally
shouldn't encode any difference.
This archive was generated by hypermail 2.1.5 : Fri Aug 15 2003 - 16:52:18 EDT