From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Tue May 22 2007 - 15:47:06 CDT
JFC Morfin wrote on Tuesday, May 22, 2007 6:59 PM
> I need an internationalized table of the hexatridecimal codes 
> (http://en.wikipedia.org/wiki/Base_36) in the largest number of scripts.
> 1. would someone have worked on that topic?
> 2. a first degree solution seems to select in each script 26 graphemes 
> that will be used to transliterate a basic ASCII table.
>     - are there technical objections to that
Yes.  Basic Greek has 24 letters.  You can add 3 more if you include the 
letters used only for numbers.  Hebrew has 22 letters.  Both languages 
traditionally uses letters for nuneric values.
On the other hand, Thai has 42, 44 or 46 letters depending on whether you 
count the obsolete letters and on whether you count the vowel letters.  The 
official count is 44, to which you can add 10 digits, making base 54.  That 
would be a proper Thai version.
The point of base 36 is that you are using a basic set of characters that 
(a) will be resistant to most text folding operations and (b) use just one 
byte per digit.  Condition (b) will only be satisfied if you use a 
compression scheme like SCSU or use a 'national' code page.  The latter is 
not recomended, and is not available for all phonetically based scripts.
You also need to consider your choice of decimal digits carefully.  Do you 
use indigenous digits, or the Arabic digits (i.e. 1234567890)?  The latter 
are often preferred to the 'indigenous' numeral systems.  For example, 
Italians do not normally use Roman numerals, and I've seen examples of Thai 
children's sums performed using Arabic digits and then transliterated into 
Thai digits.  Thai addresses are normally written using Arabic numerals.
So, why are you doing this?
>     - are there advises on the best way to select them for each script
>     - are there advises for the transliteration (same alphabetical order 
> as in ASCII may lead to more easy to compare outputs?)
For the alphabetic portion, I would seriously consider the characters used 
for alphabetically ordered lists.  This can be different from the collation 
order.  For example, for Thai, the order roughly corresponds to the 
alphabetic order, but omits KHO KHUAT, KHO KHON and KHO RAKHANG!  The only 
example I know for the Arabic script is the Persian order.  This follows the 
*old* order of the alphabet, the abjad havaz hoti kalman etc. ordering. 
These orders don't seem to be in CLDR.
>     - would some tables already exist?
> Thank you for your help and inputs.
Richard. 
This archive was generated by hypermail 2.1.5 : Tue May 22 2007 - 15:48:27 CDT