From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Wed Jul 30 2003 - 14:13:07 EDT
On 30/07/2003 09:25, Ted Hopp wrote:
>On Wednesday, July 30, 2003 8:21 AM, Peter Kirk wrote:
>
>
>>... The vowel form,
>>Ted's holam male, is encoded as holam followed by vav, and the consonant
>>vav with holam is encoded simply as that.
>>
>>
>
>Encoding 05B9 before the vav to create a kholam male can be a complicated
>business. Consider the (non-authentic) spelling used in the hugely popular
>"501 Hebrew Verbs" by Shmuel Bolozky (Barron's), where vowels and ketiv male
>(plene spelling) are mixed. (This is frequently done for pedagogical
>applications.) A particularly striking word is borrowers (f): <lamed-kholam
>male-vav-kholam male-tav>. Under the proposal, that would be encoded
>[05DC.05B9.05D5.05D5.05B9.05D5.05EA] -- somewhat difficult to parse, if you
>ask me. ...
>
This is complicated, but not actually ambiguous. To simplify, let's use
the CCAT encoding in which this would be written LOWWOWT. By the
algorithm used in Ezra SIL and in SBL Hebrew, each O before a W is
shifted from the left of the preceding consonant to the right of the W,
i.e. treated as holam male, as long as the W has no (other) vowel. This
rule applied to both of these O's so this will be rendered correctly.
Test - view with Ezra SIL or SBL Hebrew (there is a known bug with the
latest beta version of the latter):
לֹווֹות
Result: nearly right in Ezra SIL, but the second holam has not shifted
on to the following vav. Maybe shift from vav to vav is disabled for
some reason. SBL Hebrew has the same problem, also it fails to
distinguish the two positions of vav (known bug).
>... There will also be a bad ambiguity for the present, female, plural
>of borrow: <lamed-kholam male-vav-kholam chaser-tav>. The resulting encoding
>under the proposal is [05DC.05B9.05D5.05D5.05B9.05EA]. This could also be
>interpreted <lamed-kholam chaser-vav-vav-cholam khaser-tav> (with the
>reasonable but incorrect interpretation that the double-vav is to indicate a
>consonantal vav, ...
>
This also comes out correctly. We have LOWWOT. The first O shifts to
make holam male. The second one does not as O does not shift on to T.
So we have the two different positionings of holam on vav next to one
another, something which by the way never happens in the Hebrew Bible. Test:
לֹווֹת
Result: exactly right in Ezra SIL, SBL Hebrew fails to distinguish the
two positions of vav (known bug).
I suppose an alternative form which might appear would be LOWOWT, with
the first vowel holam haser and the second holam male. In this case the
first O would stay with the L as the following W has an O, but the
second O would shift to the top right of the second W. Test:
לֹוֹות
Result: again exactly right in Ezra SIL and in SBL Hebrew.
Then how would Jony Rosenne's preferred encoding fare here? He would
encode the former LWOWOT. After the L, my suggested (unimplemented, so I
can't test it) algorithm to distinguish expects a vowel and so
interprets WO as holam male, and after holam male it expects a consonant
and so interprets the next WO as vav plus holam. Correct. The second
form he would encode as LOWWOT, with holam haser first. No problem with
that. Then vav on its own, a consonant so expecting a vowel to follow.
So the following holam vav is interpreted as holam male. Correct.
>... analogous to the the past tense, female, second person of
>borrow: <lamed-qamats-vav-vav-qamats-he>.).
>
To me as a reader of biblical Hebrew, this form looks like an error. I
would expect either sheva under the first vav, or the two vavs to be
combined into one with dagesh. Nowhere in the Bible do two consonantal
vavs occur together, without a full vowel between them.
>
>How would one interpret: [05E7.05B9.05D5.05B9.05D5]? This is how the
>proposed scheme would encode a word that appears in Brown-Driver-Biggs under
>entry I for kavah (qof-qamats, vav-qamats, he). (It should be interpreted
><qof-kholam khaser-vav-kholam male>. How'd you do?)
>
>
QOWOW. First W is followed by O, so first O doesn't shift and W is taken
as a consonant. Second W is not followed by a vowel so second O shifts,
holam male. Yes, I think it's right. Test:
קֹוֹו
Result: correct in Ezra SIL and in SBL Hebrew.
Jony would encode QOWWO. That would also come out correct.
>It seems to me that it will be difficult-to-impossible to develop a parsing
>algorithm for this kind of thing, ...
>
I think we need to congratulate Joan, John H, and those who worked with
them for successfully doing the impossible. It works now, Ted. Well,
very nearly. The small problems I identified are easily fixable. The
version of the algorithm which works with Jony's encoding is less simple
so I am not yet sure if it is possible.
>... even without considering things like
>transliterations and other irregular applications. Combining characters
>should follow their base characters. We just have to live without the kholam
>male for now (or create it using "markup", which can apparently solve all
>problems).
>
Actually "markup" solves no problems at all, it just passes the buck and
reinforces the impression many already have that Unicode is a waste of
time because it can't do what they need.
But why live without the holam male? After all, if it is a separate form
in Hebrew (and we have established, I think, that it has been for 1000
years), and since you don't like the way which some have used to encode
it, why not add it to Unicode as a separate new character? After all, if
the French had found that one of their accented characters was not in
Unicode, I don't think they would have said that they could live without
it or use markup. They would have fought tooth and nail to get it added
to the standard. Why don't you suggest that? That's not a breach of the
stability policy. (Maybe the preferred addition would be a new combining
mark, right holam, rather than a new precomposed character, but that is
a detail.)
-- Peter Kirk peter.r.kirk@ntlworld.com http://web.onetel.net.uk/~peterkirk/
This archive was generated by hypermail 2.1.5 : Wed Jul 30 2003 - 15:21:57 EDT