RE: Unicode Transliteration Guidelines released

From: Richard Ishida (
Date: Mon Jan 21 2008 - 09:35:58 CST

  • Next message: Philippe Verdy: "RE: Proposal to encode three combining diacritical marks for Low German dialect writing"

    Just a couple of other notes:


    An interesting example I came across while doing martial arts: the kick called mae geri in English transcription is mae geli in French transcription, since the r in French is a very different sound.


    The examples the document uses relate to very simple issues. I would suggest that you mention that reversible transliterations can often be positively misleading wrt pronunciation for some scripts where the same symbol can represent more than one sound or affect other sounds according to context. This is particularly common for example with south asian and south-east asian scripts. For example, in the Bengali নিঃশব, transliterated as niḥśaba in the CLDR system, the visarga ḥ is not pronounced itself (whereas elsewhere it is) but lengthens the ś sound, and the final inherent a is pronounced (whereas it commonly is not), and the two inherent a's are pronounced as ɔ and ô, respectively.

    The level of knowledge required to interpret such a transliteration phonetically is way greater than for the examples you mention, and unless you are quite skilled, you can't expect to be able to reliably work out the phonetics of the actual text.


    Another thing to look out for when dealing with cased scripts is simply that the characters in the target must always be capable of switching case too - ie. many IPA symbols such as ʃ cannot be used since they cannot represent case distinctions.


    Richard Ishida
    Internationalization Lead
    W3C (World Wide Web Consortium)


    > -----Original Message-----
    > From: [] On
    > Behalf Of Rick McGowan
    > Sent: 19 January 2008 16:58
    > To:
    > Subject: Unicode Transliteration Guidelines released
    > The Unicode CLDR committee has released
    > "Unicode Transliteration Guidelines":
    > Regards,
    > Rick McGowan
    > Unicode, Inc.

    This archive was generated by hypermail 2.1.5 : Mon Jan 21 2008 - 09:34:39 CST