Re: Transliterator

From: Mark Davis (mark.davis@jtcsv.com)
Date: Fri Apr 29 2005 - 09:48:22 CST

  • Next message: Hans Aberg: "Re: Code Point -- What is the integer?"

    I should also point out that we do have provision for refinements of the
    script-script transliterators; one could have a 'Hindi-English' or
    'Hindi-German' transliterator with different rules. One can also do a
    transcription (which does not round-trip, and thus might ignore differences
    in the source.

    You can also make compound transliterators to apply multiple effects. You
    can see that by setting Compound 1 in
    http://ibm.com/software/globalization/icu/demo/transform to

       any-latin; nfd ; [:mark:] remove ; nfc

    With that, you get

    yunikoda kya hai?

    yunikoda pratyeka aksara ke li'e eka visesa nambara pradana karata hai,
    cahe ko'i bhi plaitaphorma ho,
    cahe ko'i bhi programa ho,
    cahe ko'i bhi bhasa ho.

    ‎Mark

    ----- Original Message -----
    From: "Bob Eaton" <pete_dembrowski@hotmail.com>
    To: <wiedenhoeft@gmx.net>; <mark.davis@jtcsv.com>
    Cc: <unicode@unicode.org>
    Sent: Friday, April 29, 2005 07:25
    Subject: Re: Transliterator

    > >>yūnikōḍa kyā hai?
    > >>
    > >>yūnikōḍa pratyēka akṣara kē li'ē ēka viśēṣa nambara pradāna
    > >>karatā hai,
    > >>cāhē kō'ī bhī plaiṭaphŏrma hō,
    > >>cāhē kō'ī bhī prōgrāma hō,
    > >>cāhē kō'ī bhī bhāṣā hō.
    > >
    > >This seems a bit mechanical to me, because in transliteration you'll have
    > >to drop many inherent vowels (at the end of most words (but not in
    sanskrit
    > >loanwords, and not after y, l ...), and in a syllable before a
    non-inherent
    > >vowel (करता is kartā, not karatā), and in each syllable between
    > >two syllables with vowels, and...):
    >
    > [...]
    >
    > >Resolving ambiguity of औ au / ऐ ai and अउ a'u / अइ a'i with an
    > >apostroph seems not that good idea. Using digraphs au / ai for the first
    > >ones to enable backward transliteration might be a good idea, if they can
    > >be encoded in unicode (besides, I don't think अउ a'u / अइ a'i will
    > >ever occure in hindi, you'd rather write अय ay / अव av).
    >
    > In fact, the transliterator here is called "Devanagari-Latin" and is not
    > necessarily related to Hindi. In order to make it reversible for any
    > arbitrary Devanagari (for the many minority languages in South Asia that
    > also use Devanagari), these "infilicities" were necessary.
    >
    > Bob
    >
    > P.S. Check out http://scripts.sil.org/EncCnvtrs for a package that
    includes
    > a .Net wrapper for these ICU transliterators allowing them to be used
    (more
    > easily) in .Net/COM enabled programs (VBA, C#, etc).
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri Apr 29 2005 - 09:50:43 CST