From: Jim Allan (jallan@smrtytrek.com)
Date: Mon Nov 10 2003 - 14:02:09 EST
Jim Ramonsky posted:
> I am not the one who has not thought it through. There _is_ no
> difference between decimal 7 and hex 7. They are the same digit. File777
> sorts before File999 in _ALL_ radices. 
Exactly.
So mixed hex and mixed decimal will not sort or compare properly using a 
natural sort *string* comparison even with creation of clones of the 
alpha characters with numeric values.
Why then use a natural sort at all?
If you want a natural sort using a mixed alpha and numeric string which 
may use multiple bases, a reasonable procedure might be to use the 
Unicode subscript numbers as base markers.
Upon reaching one of these the parser evaluates the superscript digits 
to create a decimal number and then goes backward until it comes to the 
first non-digit according to that base identified by that decimal 
number.  Then it can simply zero extend for sort or comparison. Or a  
binary value can be used for sort or comparison if required.
This solves for all bases up to base 36. Such a system would be 
understood on sight by humans.
Or again, if hex number are the only issue, use some normal 
hex-indication flag in the string so that both humans and the customized 
natural sort will know that the number is hex and where the number 
begins and ends, e.g. File-0x15A-19, File-oxB23A5-25, 
File-ox123ABCD-Extra  in which the center portion, between the two 
hyphens, would be recognized as hex by the "0x" prefix.
Using symbols that the computer automatically distinguishes while human 
beings do not is a *dangerous* solution to any problem. Enough typos are 
made even when symbols are different. It is common in producing random 
uppercase alpha / numeric codes to avoid 0, O, Q, 1, I, 5, S, 8, B, U, V 
for that reason alone.
Now a completely new set of hex digits, as has been suggested, might 
make sense. But that is not for Unicode to prescribe, but for 
mathematical associations or perhaps some other computer standards 
organization. If such a set of digits were proposed by international 
organizations with very strong backing (comparable to introduction of 
the Euro symbol) then they would certainly have a place in Unicode.
Or if a particular computer language were to introduce them in the PUA 
for that language and that usage became popular, then again they would 
be encoded by Unicode.
But one wants to avoid as much as possible symbols that look identical 
to human beings but have radically different meanings. Unicode as enough 
of those by necessity and for backward compatibility.
Jim Allan
This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 14:50:37 EST