Re: Transliterating ancient scripts [was: ASCII and Unicode lifespan]

From: Gregg Reynolds (unicode@arabink.com)
Date: Mon May 23 2005 - 16:23:30 CDT

Next message: Hans Aberg: "Re: ASCII and Unicode lifespan"

Previous message: Philippe Verdy: "Re: hebrew font conversion"
In reply to: Dean Snyder: "Re: Transliterating ancient scripts [was: ASCII and Unicode lifespan]"
Next in thread: Dean Snyder: "Re: Transliterating ancient scripts [was: ASCII and Unicode lifespan]"
Reply: Dean Snyder: "Re: Transliterating ancient scripts [was: ASCII and Unicode lifespan]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Dean Snyder wrote:
> Tom Emerson wrote at 10:07 AM on Monday, May 23, 2005:
>
>
>>Dean Snyder writes:
>>
>>>Transliteration is lossy.
>>
>>Not necessarily.
>
..etc
>
> Buckwalter's transliteration of Arabic <http://www.qamus.org/
> transliteration.htm> is, as are all transliterations, lossy. You cannot
> tell, for example, from this transliteration that Arabic r & z are
> differentiated only by a tiny dot. THAT is pertinent information in many
> contexts.
>

Huh? Latin "r" denotes the Arabic letter called راء and Latin "z"
denotes the letter called زين; where's the confusion? Can you tell that
difference from the integers x0631 and x0632?

An example of a bigger problem (or a more solid one anyway) is that
Buckwalter's scheme doesn't have any way of indicating that the hamza
should fall beneath the ya as sometimes happens. It has other similar
problems in representing traditionally written text. But then again, so
does Unicode (which after all is itself a transliteration from
letterforms to numbers). In fact I think calling mathematical models of
written languages "transliterations" is a bit misleading, since we're
ultimately talking about numbers. A (computational) transliteration is
an encoding design by another name; its degree of lossiness depends
entirely on how well designed it is.

At least in the case of Arabic, it's possible to design an encoding
(call it a transliteration if you'd like) that loses no information
going from page to computer. I made one using Latin-1 a few years ago
and was able to encode Quranic text accurately. In fact, I was able to
encode lots more information than just that - for example, the
distinction between radicals and non-radicals, the deep-spelling of
words (e.g. omission of a radical), etc. Which just goes to show that
in encoding (=transliteration) design it's just a question of how much
info you want to capture - there's no unavoidable lossiness that I can see.

-g

Next message: Hans Aberg: "Re: ASCII and Unicode lifespan"
Previous message: Philippe Verdy: "Re: hebrew font conversion"
In reply to: Dean Snyder: "Re: Transliterating ancient scripts [was: ASCII and Unicode lifespan]"
Next in thread: Dean Snyder: "Re: Transliterating ancient scripts [was: ASCII and Unicode lifespan]"
Reply: Dean Snyder: "Re: Transliterating ancient scripts [was: ASCII and Unicode lifespan]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon May 23 2005 - 16:24:19 CDT