From: Tom Emerson (email@example.com)
Date: Wed Mar 02 2005 - 14:36:00 CST
Gregg Reynolds writes:
> Not quite; depends on what you mean by "word". I'll give you a simple
> example to illustrate; a more detailed explanation would involve an
> explanation of how spoken Arabic works and how it is represented in
> written Arabic, which I'd be happy to provide if you're interested, but
> for now let's stick with an example.
No need, I work in Arabic daily, especially the regional variants.
> The word "risala#" [...] means roughly "letter, message". (I use # as
> teh marbuta.) Pronounced in isolation, the word ends in a soft 'h'
> sound - which is why the teh marbuta (in this form) looks like a 'heh'
> [...] Suffix the word with a personal pronoun (indicating possesion) and
> you get "risalat*kum" [...] (I use * to mean any short vowel). The
> pronunciation is /t/, just like the teh [...]
Right, this is normal Arabic orthography learned in Arabic 101. And I
agree with your that this isn't a first class letter and that it
serves a primarily morphologic role. However, that role changes when
affixes are applied to the stem, and the teh marbuta becomes teh. In
risaalatikum the change is obvious. This is an intrinsic part of
Arabic orthography: I would be surprised if a native speaker things of
the teh in risaalatikum as teh marbuta (though I may be wrong.)
It seems that you want to change the joining behavior so that
risalat*kum is encoded as 0631 0633 0627 0644 0629 0643 0645 with teh
marbuta instead of teh (062A), and have it rendered the same way. But
existing Arabic renderers do not think of things that way and this
would break pretty much all of them.
Finding risala# when searching for risalat*kum is an deeper issue,
which underlines the need for deeper linguistic analysis when doing
Arabic search. And even if this went though, ar-risala# wouldn't get
matched, or war-risala#, etc.
-- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
This archive was generated by hypermail 2.1.5 : Wed Mar 02 2005 - 14:37:42 CST