Re: Transliteration

From: Mark Davis (mark@macchiato.com)
Date: Fri Aug 03 2001 - 00:08:47 EDT


ISO distinguishes between transliteration (which represents letters) and
transcription (which represents sounds). We're following that usage when
talking about script transliteration.

What you are talking about, since it cannot be derived from simply the
letters, would be transcription. A classic case would be transcribing
English into another script, where cough, through, hiccough, though, etc.
would all have different vowels.

One could also have an ICU transliterator that did transcription -- but it
would, in general, be dictionary-driven, in many languages also needing
grammatical analysis for distinguishing homonyms. Very much like the initial
component of text-to-speech engines. Not what we are doing.

Mark
—————

πάντων μέτρον ἄνθρωπος — Πρωταγόρας
[http://www.macchiato.com]

----- Original Message -----
From: "Philipp Reichmuth" <uzsv2k@uni-bonn.de>
To: <unicode@unicode.org>
Sent: Thursday, August 02, 2001 12:17
Subject: Re: Transliteration

> The proposed transliteration mechanism, while being quite flexible
> already through the rule mechanism, suffers from the principal
> weakness of having to the morphology of the underlying word.
>
> For example, in Arabic ZDMG transcription, one can transliterate the
> sequence [Xuwwx] (i.e. strong consonant - damma - waw + shadda - vowel)
> in two ways: as [X 016B 0077 x] or as [X 0075 0077 0077 x], depending
> on whether the first w represents the long vowel u or the consonant w
> in the Arabic script, which is indiscernible from the Arabic script.
> For correctly transcribing this, the system needs detailed knowledge
> of Arabic noun and verb paradigms, which probably is beyond the scope
> of rule-based transliteration in the ICU framework.
>
> Now I do admit that this is a highly specialized case. I could imagine
> similar cases in other language/script environments as well, however.
> Unless one designs an extremely complicated ruleset, automatic
> transliteration will not achieve 100% accuracy (which I don't know if
> it's your goal) This goes well beyond the scope
> of character-based transliteration, though.
>
> Greetings
> Philipp mailto:uzsv2k@uni-bonn.de
> __________________________
> With searching comes loss / And the presence of absence / The server, not
found
>
>
>
>



This archive was generated by hypermail 2.1.2 : Fri Aug 03 2001 - 01:41:23 EDT