Re: From [b-hebrew] Variant forms of vav with holem

From: Peter Kirk (
Date: Wed Jul 30 2003 - 14:13:07 EDT

    On 30/07/2003 09:25, Ted Hopp wrote:

    >On Wednesday, July 30, 2003 8:21 AM, Peter Kirk wrote:
    >>... The vowel form,
    >>Ted's holam male, is encoded as holam followed by vav, and the consonant
    >>vav with holam is encoded simply as that.
    >Encoding 05B9 before the vav to create a kholam male can be a complicated
    >business. Consider the (non-authentic) spelling used in the hugely popular
    >"501 Hebrew Verbs" by Shmuel Bolozky (Barron's), where vowels and ketiv male
    >(plene spelling) are mixed. (This is frequently done for pedagogical
    >applications.) A particularly striking word is borrowers (f): <lamed-kholam
    >male-vav-kholam male-tav>. Under the proposal, that would be encoded
    >[05DC.05B9.05D5.05D5.05B9.05D5.05EA] -- somewhat difficult to parse, if you
    >ask me. ...
    This is complicated, but not actually ambiguous. To simplify, let's use
    the CCAT encoding in which this would be written LOWWOWT. By the
    algorithm used in Ezra SIL and in SBL Hebrew, each O before a W is
    shifted from the left of the preceding consonant to the right of the W,
    i.e. treated as holam male, as long as the W has no (other) vowel. This
    rule applied to both of these O's so this will be rendered correctly.
    Test - view with Ezra SIL or SBL Hebrew (there is a known bug with the
    latest beta version of the latter):

    Result: nearly right in Ezra SIL, but the second holam has not shifted
    on to the following vav. Maybe shift from vav to vav is disabled for
    some reason. SBL Hebrew has the same problem, also it fails to
    distinguish the two positions of vav (known bug).

    >... There will also be a bad ambiguity for the present, female, plural
    >of borrow: <lamed-kholam male-vav-kholam chaser-tav>. The resulting encoding
    >under the proposal is [05DC.05B9.05D5.05D5.05B9.05EA]. This could also be
    >interpreted <lamed-kholam chaser-vav-vav-cholam khaser-tav> (with the
    >reasonable but incorrect interpretation that the double-vav is to indicate a
    >consonantal vav, ...
    This also comes out correctly. We have LOWWOT. The first O shifts to
    make holam male. The second one does not as O does not shift on to T.
    So we have the two different positionings of holam on vav next to one
    another, something which by the way never happens in the Hebrew Bible. Test:


    Result: exactly right in Ezra SIL, SBL Hebrew fails to distinguish the
    two positions of vav (known bug).

    I suppose an alternative form which might appear would be LOWOWT, with
    the first vowel holam haser and the second holam male. In this case the
    first O would stay with the L as the following W has an O, but the
    second O would shift to the top right of the second W. Test:


    Result: again exactly right in Ezra SIL and in SBL Hebrew.

    Then how would Jony Rosenne's preferred encoding fare here? He would
    encode the former LWOWOT. After the L, my suggested (unimplemented, so I
    can't test it) algorithm to distinguish expects a vowel and so
    interprets WO as holam male, and after holam male it expects a consonant
    and so interprets the next WO as vav plus holam. Correct. The second
    form he would encode as LOWWOT, with holam haser first. No problem with
    that. Then vav on its own, a consonant so expecting a vowel to follow.
    So the following holam vav is interpreted as holam male. Correct.

    >... analogous to the the past tense, female, second person of
    >borrow: <lamed-qamats-vav-vav-qamats-he>.).
    To me as a reader of biblical Hebrew, this form looks like an error. I
    would expect either sheva under the first vav, or the two vavs to be
    combined into one with dagesh. Nowhere in the Bible do two consonantal
    vavs occur together, without a full vowel between them.

    >How would one interpret: [05E7.05B9.05D5.05B9.05D5]? This is how the
    >proposed scheme would encode a word that appears in Brown-Driver-Biggs under
    >entry I for kavah (qof-qamats, vav-qamats, he). (It should be interpreted
    ><qof-kholam khaser-vav-kholam male>. How'd you do?)
    QOWOW. First W is followed by O, so first O doesn't shift and W is taken
    as a consonant. Second W is not followed by a vowel so second O shifts,
    holam male. Yes, I think it's right. Test:


    Result: correct in Ezra SIL and in SBL Hebrew.

    Jony would encode QOWWO. That would also come out correct.

    >It seems to me that it will be difficult-to-impossible to develop a parsing
    >algorithm for this kind of thing, ...
    I think we need to congratulate Joan, John H, and those who worked with
    them for successfully doing the impossible. It works now, Ted. Well,
    very nearly. The small problems I identified are easily fixable. The
    version of the algorithm which works with Jony's encoding is less simple
    so I am not yet sure if it is possible.

    >... even without considering things like
    >transliterations and other irregular applications. Combining characters
    >should follow their base characters. We just have to live without the kholam
    >male for now (or create it using "markup", which can apparently solve all
    Actually "markup" solves no problems at all, it just passes the buck and
    reinforces the impression many already have that Unicode is a waste of
    time because it can't do what they need.

    But why live without the holam male? After all, if it is a separate form
    in Hebrew (and we have established, I think, that it has been for 1000
    years), and since you don't like the way which some have used to encode
    it, why not add it to Unicode as a separate new character? After all, if
    the French had found that one of their accented characters was not in
    Unicode, I don't think they would have said that they could live without
    it or use markup. They would have fought tooth and nail to get it added
    to the standard. Why don't you suggest that? That's not a breach of the
    stability policy. (Maybe the preferred addition would be a new combining
    mark, right holam, rather than a new precomposed character, but that is
    a detail.)

    Peter Kirk

