From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Tue May 22 2007 - 15:47:06 CDT
JFC Morfin wrote on Tuesday, May 22, 2007 6:59 PM
> I need an internationalized table of the hexatridecimal codes
> (http://en.wikipedia.org/wiki/Base_36) in the largest number of scripts.
> 1. would someone have worked on that topic?
> 2. a first degree solution seems to select in each script 26 graphemes
> that will be used to transliterate a basic ASCII table.
> - are there technical objections to that
Yes. Basic Greek has 24 letters. You can add 3 more if you include the
letters used only for numbers. Hebrew has 22 letters. Both languages
traditionally uses letters for nuneric values.
On the other hand, Thai has 42, 44 or 46 letters depending on whether you
count the obsolete letters and on whether you count the vowel letters. The
official count is 44, to which you can add 10 digits, making base 54. That
would be a proper Thai version.
The point of base 36 is that you are using a basic set of characters that
(a) will be resistant to most text folding operations and (b) use just one
byte per digit. Condition (b) will only be satisfied if you use a
compression scheme like SCSU or use a 'national' code page. The latter is
not recomended, and is not available for all phonetically based scripts.
You also need to consider your choice of decimal digits carefully. Do you
use indigenous digits, or the Arabic digits (i.e. 1234567890)? The latter
are often preferred to the 'indigenous' numeral systems. For example,
Italians do not normally use Roman numerals, and I've seen examples of Thai
children's sums performed using Arabic digits and then transliterated into
Thai digits. Thai addresses are normally written using Arabic numerals.
So, why are you doing this?
> - are there advises on the best way to select them for each script
> - are there advises for the transliteration (same alphabetical order
> as in ASCII may lead to more easy to compare outputs?)
For the alphabetic portion, I would seriously consider the characters used
for alphabetically ordered lists. This can be different from the collation
order. For example, for Thai, the order roughly corresponds to the
alphabetic order, but omits KHO KHUAT, KHO KHON and KHO RAKHANG! The only
example I know for the Arabic script is the Persian order. This follows the
*old* order of the alphabet, the abjad havaz hoti kalman etc. ordering.
These orders don't seem to be in CLDR.
> - would some tables already exist?
> Thank you for your help and inputs.
Richard.
This archive was generated by hypermail 2.1.5 : Tue May 22 2007 - 15:48:27 CDT