RE: Unicode Transliteration Guidelines released

From: Philippe Verdy (
Date: Mon Jan 28 2008 - 05:16:27 CST

  • Next message: Naoto Sato: "Re: Unicode, Java and Complex Script fonts"

    William J Poser wrote:
    > Envoyé : lundi 28 janvier 2008 02:47
    > À :;;
    > Objet : RE: Unicode Transliteration Guidelines released
    > I agree that I find it very odd for Unicode to be promulgating
    > transliterations, since an appropriate transliteration is not
    > only specific to a pair of languages but depends on the purpose
    > for which it is intended.
    > There are, however, uses for ascii transliterations even with the
    > advent of Unicode. I have had to create and implement several such
    > for the Linguistic Data Consortium. One reason for using them
    > is that sometimes people want to use existing software that cannot
    > handle Unicode, so you need to ascify the text, run it through,
    > and then convert it back. For this purpose, the transliteration can
    > be pretty arbitrary so long as it is reversible.

    I won't call this a transliteration. In fact it will be much more efficient
    to just use an alternate representation of the codepoints, without having to
    rely on complex conversion tables.

    See the "\uNNNN" syntaxic notation for example (used along with escaping
    mechanisms) that can be used for this purpose of "ASCII-fication" and
    compatibility with more limited protocols (including in cases where letter
    case is not preserved).

    It's much easier to use this sort of transform (for which you can really
    ensure that it is fully reversible, even in the most tricky cases for
    arbitrary Unicode source strings!) But I won't try to convince others that
    this is a "transliteration"!

    My opinion is that all conversion processes of Unicode texts that are FULLY
    reversible should not be named "transliterations", but "format transforms"
    (this would include for example all transcoders compatible with Unicode,
    lossless data compressors, transport encoding syntaxes like hexadecimal or

    On the opposite, the intent of a transliterator is not about preserving the
    original text but to provide readability of the text for humanes. Full
    reversibility is, most of the time, only a technical need, but not a
    linguistic need: the linguistic need is not full reversibility for arbitrary
    texts, but for texts that makes sense in some set of humane languages
    written in a given source script and with their usually accepted or
    preferred orthographies.

    This archive was generated by hypermail 2.1.5 : Mon Jan 28 2008 - 12:11:29 CST