Re: 5 Hebrew Consonances Shaping

From: Arno Schmitt (arno@zedat.fu-berlin.de)
Date: Fri Jun 04 1999 - 04:20:11 EDT


Dear John Cowan
you forward two main arguments:
1.) "the presence or absence of a final form is a matter of
spelling
(i.e. it is an error to use a final for a non-final, or a
non-final
for a final)"
2.) there is no need for "avoiding exact matching" i.e. Why should
you want to treat the two shapes of mem as one LETTER?

Why is the difference between mem and final mem a matter of
spelling in Hebrew and the difference between the four shapes of
mim in Arabic not?
Writing the middle of of mim at the end of an Arabic word is IMHO
exactly the same as writing the final form of mem in the middle of
a word -- both very unlikely "spelling errors".
If someone comes with pairs of correct Hebrew words that are
differentiated by the one being writing with mem and the other
with final mem, the one with tsadi, the other with final tsadi,
one with kaf, the other with final chaf -- like genossen -
Genossen, la Scala - la scala, la manche - la Manche, gut - Gut --
the matter would be settled. This would be proof enough that it is
a matter of spelling.

Up till now I accepted that there a exceptions in Hebrew.
After careful study I want to go a step further:
Leaving aside pe/fe there are only regular exceptions:
1.) abbreviations with a geresh at the end
so the automatic shaping algorithm must learn: do not take the
final shape whenever a letter, a ZWJ, a geresh follows
2.) abbreviations with gershaim before the final letter
so the automatic shaping algorithm must learn: do not take the
final shape for a letter after gershaim even if followed by a
separator
3.) the very special case of Israeli political party voting
symbols (like mahal, emet)
but to handle these we have the ZWJ

Now to your argument: it serves no purpose.
In English a machine finds easily "house" in "my house" and in
"houses".
If final shapes are stored as different letters -- and not only
displayed as final shapes as I believe they should --, "kapaim"
and "kafex" are not found when the machine looks for "kaf", but it
should -- unless I tell it "whole word only"
 
John Cowan wrote:

> > 1.) If this is true for feh/peh, why should the other four final
> > shape have separate codepoints?
>
> I have no evidence on their distribution, but evidence of absence
> is not absence of evidence.

But since it all books and e-Mails only pe/fe examples are given,
it seems pretty save, that there are no cases of non-final forms
of the the other occurring - apart from the three groups of
"words" mentioned above
>
> > 2.) There are two inaccuracies in "final PEH normally denotes [f]"
> > a) not normally, but always
>
> Since the orthography of Hebrew (like that of English) is established
> by convention only, with no specific authority, I hesitate to say
> "always" about any such point.

cautious and reasonable
  
> > b) therefore it makes no sense to talk of "final PEH", it
> > should be "final FEH"
>
> I use the term PEH here as an ASCII equivalent of the basic letter,
> nominal or final, with or without dagesh. (So does my source.)
>
> > a+-b) there are no final peh and caf
>
> In your sense of the term you are correct. However, common
> convention and the Unicode Standard both speak of "final pe(h)."
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT