From: John Cowan (cowan@mercury.ccil.org)
Date: Sat Aug 16 2003 - 15:19:33 EDT
Pim Blokland scripsit:
> Besides, your example is proof that the implementation can change;
> has to change. Where applications could use 8-bit characters to
> store hex digits in the old days, they now have to use 16-bit
> characters to keep up with Unicode...
You are confusing the *representation* of characters with the *choice*
of characters. The representation of characters for hex digits can and
does change: it can be ASCII, EBCDIC, or Unicode. The choice of
characters is fixed: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A/a, B/b, C/c, D/d, E/e,
F/f.
> > There is also a HUGE semantic difference between D meaning the
> > letter D and Roman numeral D meaning 500.
>
> and those have different code points! So you're saying Jill is
> right, right?
No. The Roman numeral characters are encoded solely for compatibility with
East Asian character sets. (The same is true of the KELVIN SIGN.)
> What we're talking about is different general categories, different
> numeric values and even, oddly enough, different BiDi categories.
> Doesn't that qualify for creating new characters?
As a practical matter, trying to go through all legacy texts (now including
legacy Unicode texts!) and disambiguate every instance of A-F/a-f between
alphabetic and hexanumeric uses would be inconceivable. The justification
for not splitting off Turkish i and I from general Latin, due to their
unusual case mappings, is exactly the same.
-- If you have ever wondered if you are in hell, John Cowan it has been said, then you are on a well-traveled http://www.ccil.org/~cowan road of spiritual inquiry. If you are absolutely http://www.reutershealth.com sure you are in hell, however, then you must be jcowan@reutershealth.com on the Cross Bronx Expressway. --Alan Feur, NYTimes, 2002-09-20
This archive was generated by hypermail 2.1.5 : Sat Aug 16 2003 - 15:54:53 EDT