Re: teh marbuta

From: Tom Emerson (tree@basistech.com)
Date: Wed Mar 02 2005 - 14:36:00 CST

  • Next message: Kenneth Whistler: "Re: teh marbuta"

    Gregg Reynolds writes:
    > Not quite; depends on what you mean by "word". I'll give you a simple
    > example to illustrate; a more detailed explanation would involve an
    > explanation of how spoken Arabic works and how it is represented in
    > written Arabic, which I'd be happy to provide if you're interested, but
    > for now let's stick with an example.

    No need, I work in Arabic daily, especially the regional variants.

    > The word "risala#" [...] means roughly "letter, message". (I use # as
    > teh marbuta.) Pronounced in isolation, the word ends in a soft 'h'
    > sound - which is why the teh marbuta (in this form) looks like a 'heh'
    > [...] Suffix the word with a personal pronoun (indicating possesion) and
    > you get "risalat*kum" [...] (I use * to mean any short vowel). The
    > pronunciation is /t/, just like the teh [...]
    [...]

    Right, this is normal Arabic orthography learned in Arabic 101. And I
    agree with your that this isn't a first class letter and that it
    serves a primarily morphologic role. However, that role changes when
    affixes are applied to the stem, and the teh marbuta becomes teh. In
    risaalatikum the change is obvious. This is an intrinsic part of
    Arabic orthography: I would be surprised if a native speaker things of
    the teh in risaalatikum as teh marbuta (though I may be wrong.)

    It seems that you want to change the joining behavior so that
    risalat*kum is encoded as 0631 0633 0627 0644 0629 0643 0645 with teh
    marbuta instead of teh (062A), and have it rendered the same way. But
    existing Arabic renderers do not think of things that way and this
    would break pretty much all of them.

    Finding risala# when searching for risalat*kum is an deeper issue,
    which underlines the need for deeper linguistic analysis when doing
    Arabic search. And even if this went though, ar-risala# wouldn't get
    matched, or war-risala#, etc.

        -tree

    -- 
    Tom Emerson                                          Basis Technology Corp.
    Software Architect                                 http://www.basistech.com
      "Beware the lollipop of mediocrity: lick it once and you suck forever"
    


    This archive was generated by hypermail 2.1.5 : Wed Mar 02 2005 - 14:37:42 CST