Re: hexatridecimal internationalisation

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Tue May 22 2007 - 15:47:06 CDT

  • Next message: Richard Wordingham: "Re: Order of Infrequent Combining Marks in Thai"

    JFC Morfin wrote on Tuesday, May 22, 2007 6:59 PM

    > I need an internationalized table of the hexatridecimal codes
    > (http://en.wikipedia.org/wiki/Base_36) in the largest number of scripts.

    > 1. would someone have worked on that topic?
    > 2. a first degree solution seems to select in each script 26 graphemes
    > that will be used to transliterate a basic ASCII table.
    > - are there technical objections to that

    Yes. Basic Greek has 24 letters. You can add 3 more if you include the
    letters used only for numbers. Hebrew has 22 letters. Both languages
    traditionally uses letters for nuneric values.

    On the other hand, Thai has 42, 44 or 46 letters depending on whether you
    count the obsolete letters and on whether you count the vowel letters. The
    official count is 44, to which you can add 10 digits, making base 54. That
    would be a proper Thai version.

    The point of base 36 is that you are using a basic set of characters that
    (a) will be resistant to most text folding operations and (b) use just one
    byte per digit. Condition (b) will only be satisfied if you use a
    compression scheme like SCSU or use a 'national' code page. The latter is
    not recomended, and is not available for all phonetically based scripts.

    You also need to consider your choice of decimal digits carefully. Do you
    use indigenous digits, or the Arabic digits (i.e. 1234567890)? The latter
    are often preferred to the 'indigenous' numeral systems. For example,
    Italians do not normally use Roman numerals, and I've seen examples of Thai
    children's sums performed using Arabic digits and then transliterated into
    Thai digits. Thai addresses are normally written using Arabic numerals.

    So, why are you doing this?

    > - are there advises on the best way to select them for each script
    > - are there advises for the transliteration (same alphabetical order
    > as in ASCII may lead to more easy to compare outputs?)

    For the alphabetic portion, I would seriously consider the characters used
    for alphabetically ordered lists. This can be different from the collation
    order. For example, for Thai, the order roughly corresponds to the
    alphabetic order, but omits KHO KHUAT, KHO KHON and KHO RAKHANG! The only
    example I know for the Arabic script is the Persian order. This follows the
    *old* order of the alphabet, the abjad havaz hoti kalman etc. ordering.
    These orders don't seem to be in CLDR.

    > - would some tables already exist?
    > Thank you for your help and inputs.

    Richard.



    This archive was generated by hypermail 2.1.5 : Tue May 22 2007 - 15:48:27 CDT