RE: Unicode Transliteration Guidelines released

From: Richard Ishida (
Date: Mon Jan 21 2008 - 13:12:04 CST

  • Next message: "Re: Abkhasian CHE with descender"

    The things you describe at the beginning of your list below are what I would call transcriptions, rather than transliterations. There is no need to represent the case of the source in those, I agree. But equally, for many scripts there is no reliable way to easily reconstruct the source script from something like IPA.

    What I'm talking about is what I called transliteration, and defined as a method of converting text that allows you to recreate the original source from the target (ie. reversability). If you want to do that for a source script that is multicameral, you would need some way of capturing whether the source contained upper or lower case characters.*

    This discussion is exactly why I wrote earlier that I think the Transliteration Guidelines document should be more careful in separating, describing and labeling these two different approaches.


    * You could of course use ʃ in a 'transliteration scheme' if you included additional information, such as, say, an up-arrow immediately afterwards to indicate when it should be converted to an upper case character.

    Richard Ishida
    Internationalization Lead
    W3C (World Wide Web Consortium)


    > -----Original Message-----
    > From: Philippe Verdy []
    > Sent: 21 January 2008 18:43
    > To: 'Richard Ishida'; 'Rick McGowan';
    > Subject: RE: Unicode Transliteration Guidelines released
    > Richard Ishida wrote:
    > > Cautions
    > >
    > > Another thing to look out for when dealing with cased scripts is simply
    > > that the characters in the target must always be capable of switching
    > case
    > > too - ie. many IPA symbols such as ʃ cannot be used since they cannot
    > > represent case distinctions.
    > Why that?
    > The target must first support multicameral orthographies.
    > * If the target is IPA, no such requirement is necessary.
    > * Same thing for transliteration to X-SAMPA, despite it uses the basic
    > Latin alphabet, but without case (lowercase and uppercase are used for
    > distinct sounds).
    > * Same thing for the transliteration to Hangul alphabet or Georgian (true
    > most of the time with modern or old classic orthographies, but possibly
    > false for classical religious texts), or Arabic, Hebrew, or syllabaries
    > (Aboriginal Canadian, Cherokee, Japanese Kanas...), or the many Indic
    > abugidas (including Tibetan).
    > Multicameral scripts are the exception (even though they predominate in
    > worldwide use), not the rule.

    This archive was generated by hypermail 2.1.5 : Mon Jan 21 2008 - 13:10:42 CST