Re: Transliterating ancient scripts [was: ASCII and Unicode lifespan]

From: Gregg Reynolds (
Date: Mon May 23 2005 - 16:23:30 CDT

  • Next message: Hans Aberg: "Re: ASCII and Unicode lifespan"

    Dean Snyder wrote:
    > Tom Emerson wrote at 10:07 AM on Monday, May 23, 2005:
    >>Dean Snyder writes:
    >>>Transliteration is lossy.
    >>Not necessarily.
    > Buckwalter's transliteration of Arabic <
    > transliteration.htm> is, as are all transliterations, lossy. You cannot
    > tell, for example, from this transliteration that Arabic r & z are
    > differentiated only by a tiny dot. THAT is pertinent information in many
    > contexts.

    Huh? Latin "r" denotes the Arabic letter called راء and Latin "z"
    denotes the letter called زين; where's the confusion? Can you tell that
    difference from the integers x0631 and x0632?

    An example of a bigger problem (or a more solid one anyway) is that
    Buckwalter's scheme doesn't have any way of indicating that the hamza
    should fall beneath the ya as sometimes happens. It has other similar
    problems in representing traditionally written text. But then again, so
    does Unicode (which after all is itself a transliteration from
    letterforms to numbers). In fact I think calling mathematical models of
    written languages "transliterations" is a bit misleading, since we're
    ultimately talking about numbers. A (computational) transliteration is
    an encoding design by another name; its degree of lossiness depends
    entirely on how well designed it is.

    At least in the case of Arabic, it's possible to design an encoding
    (call it a transliteration if you'd like) that loses no information
    going from page to computer. I made one using Latin-1 a few years ago
    and was able to encode Quranic text accurately. In fact, I was able to
    encode lots more information than just that - for example, the
    distinction between radicals and non-radicals, the deep-spelling of
    words (e.g. omission of a radical), etc. Which just goes to show that
    in encoding (=transliteration) design it's just a question of how much
    info you want to capture - there's no unavoidable lossiness that I can see.


    This archive was generated by hypermail 2.1.5 : Mon May 23 2005 - 16:24:19 CDT