RE: Unicode Transliteration Guidelines released

From: Richard Ishida (
Date: Mon Jan 28 2008 - 05:59:58 CST

  • Next message: Richard Ishida: "RE: Unicode Transliteration Guidelines released"

    I personally find reversible transliterations can be useful for
    understanding at a glance what components are contained in a string of
    complex glyphs in a script I'm not terribly familiar with. For example,
    people who aren't pretty familiar with nastaliq forms of Urdu can struggle
    to figure out what characters are in front of them. Same goes for some
    combinations of characters in scripts like Khmer and Malayalam - eg. is that
    a Khmer r or the very similar vowel sign, or is it the vowel sign with
    multiple parts... It's not so much that I can reverse the notation that's
    important for me in those cases, it's more that I know that there's no
    ambiguity between characters in the transliteration and those in the script.
    It can also help me recognize and remember words better - eg. for Hebrew,
    which is a script I don't know much about at all, I can distinguish
    character names in transliterated Latin form much more easily than in the
    original Hebrew.


    Richard Ishida
    Internationalization Lead
    W3C (World Wide Web Consortium)


    > -----Original Message-----
    > From: [] On
    > Behalf Of William J Poser
    > Sent: 28 January 2008 01:47
    > To:;;
    > Subject: RE: Unicode Transliteration Guidelines released
    > I agree that I find it very odd for Unicode to be promulgating
    > transliterations, since an appropriate transliteration is not
    > only specific to a pair of languages but depends on the purpose
    > for which it is intended.
    > There are, however, uses for ascii transliterations even with the
    > advent of Unicode. I have had to create and implement several such
    > for the Linguistic Data Consortium. One reason for using them
    > is that sometimes people want to use existing software that cannot
    > handle Unicode, so you need to ascify the text, run it through,
    > and then convert it back. For this purpose, the transliteration can
    > be pretty arbitrary so long as it is reversible. Indeed, some people
    > here have used a slightly modified form of the Unicode character names
    > as the ascii transliteration. It is long-winded, but the computers
    > do the work and they don't seem to mind.
    > Another reason for using an ascii transliteration is when you've
    > got computational linguists working on a language that they don't
    > know well, whose writing system they cannot easily work with.
    > In this case, you want the transliteration to be less arbitrary
    > and to give some idea of the pronounciation so that they can talk
    > to themselves and each other about the data (suppose, for example,
    > they've got to write a morphological analyzer).
    > Bill

    This archive was generated by hypermail 2.1.5 : Mon Jan 28 2008 - 05:58:44 CST