Re: hexatridecimal internationalisation

From: JFC Morfin (jefsey@jefsey.com)
Date: Thu May 24 2007 - 18:57:29 CDT

  • Next message: Richard Wordingham: "Re: hexatridecimal internationalisation"

    Dear Richard,
    I am sorry being late responding, I was travelling. Thank you very
    much for these very precious remarks.
    The need I have is for transliteration of programming variables, code
    ID, etc. from ASCII keyboards to non-ASCII keyboards.

    At 22:47 22/05/2007, Richard Wordingham wrote:
    >JFC Morfin wrote on Tuesday, May 22, 2007 6:59 PM
    >>I need an internationalized table of the hexatridecimal codes
    >>(http://en.wikipedia.org/wiki/Base_36) in the largest number of scripts.
    >
    >>1. would someone have worked on that topic?
    >>2. a first degree solution seems to select in each script 26
    >>graphemes that will be used to transliterate a basic ASCII table.
    >> - are there technical objections to that
    >
    >Yes. Basic Greek has 24 letters. You can add 3 more if you include
    >the letters used only for numbers. Hebrew has 22 letters. Both
    >languages traditionally uses letters for nuneric values.

    >On the other hand, Thai has 42, 44 or 46 letters depending on
    >whether you count the obsolete letters and on whether you count the
    >vowel letters. The official count is 44, to which you can add 10
    >digits, making base 54. That would be a proper Thai version.
    >
    >The point of base 36 is that you are using a basic set of characters
    >that (a) will be resistant to most text folding operations and (b)
    >use just one byte per digit. Condition (b) will only be satisfied if
    >you use a compression scheme like SCSU or use a 'national' code
    >page. The latter is not recomended, and is not available for all
    >phonetically based scripts.
    >
    >You also need to consider your choice of decimal digits
    >carefully. Do you use indigenous digits, or the Arabic digits (i.e.
    >1234567890)? The latter are often preferred to the 'indigenous'
    >numeral systems. For example, Italians do not normally use Roman
    >numerals, and I've seen examples of Thai children's sums performed
    >using Arabic digits and then transliterated into Thai digits. Thai
    >addresses are normally written using Arabic numerals.
    >
    >So, why are you doing this?

    Pure transliteration for local spelling and keyboard use.

    >> - are there advises on the best way to select them for each script
    >> - are there advises for the transliteration (same alphabetical
    >> order as in ASCII may lead to more easy to compare outputs?)
    >
    >For the alphabetic portion, I would seriously consider the
    >characters used for alphabetically ordered lists. This can be
    >different from the collation order. For example, for Thai, the
    >order roughly corresponds to the alphabetic order, but omits KHO
    >KHUAT, KHO KHON and KHO RAKHANG! The only example I know for the
    >Arabic script is the Persian order. This follows the *old* order of
    >the alphabet, the abjad havaz hoti kalman etc. ordering. These
    >orders don't seem to be in CLDR.
    >
    >> - would some tables already exist?
    >>Thank you for your help and inputs.

    Thank you again for the tips.
    Also from what you say you seem to have worked on the matter, but
    there is no definitive work in that area?
    All the best.
    jfc



    This archive was generated by hypermail 2.1.5 : Thu May 24 2007 - 18:59:44 CDT