Re: Yerushala(y)im - or Biblical Hebrew

From: Peter Kirk (
Date: Tue Jul 29 2003 - 10:22:35 EDT

  • Next message: Peter Kirk: "Re: Back to Hebrew, was OT:darn'd fools"

    On 28/07/2003 23:37, Jony Rosenne wrote:

    >We had a discussion in the SII and the consensus was that we should object
    >- any change or addition related to Hebrew that would invalidate existing
    >Unicode data or require its modification or re-examination
    I can agree that any change should not invalidate existing valid data.
    But that shouldn't imply that we must validate existing invalid data.
    There is a lot of existing data which, although encoded in Unicode
    characters, is invalid or mis-spelled in one sense or another,
    deliberately so in order to kludge a reasonably good visual
    representation from bad old software. For example, at ZWJ is inserted after vav and before holam when the
    vav is a consonant because with certain software and font combinations
    that has the required effect of shifting the holam to the left. We can't
    simply declare in some kind of amnesty that every existing text is
    validly encoded.

    >- any change or addition to Unicode that would make the use of Hebrew more
    >complicated or confuse the common user
    Absolutely. But nothing confuses the common user more than not knowing
    how he or she is supposed to encode a particular text. What is needed is
    not so much changes to Unicode as clear guidelines for the common user.

    >- any change or addition to Unicode that would require a user of Hebrew to
    >have a higher level of knowledge, e.g. to distinguish between items not
    >commonly distinguished, for example the two meanings of Vav with Holam.
    Are we confining "user of Hebrew" to people who know how to speak the
    language? If so these people already know how to distinguish the two
    meanings of vav with holam because they pronounce them quite
    differently. Some users of biblical Hebrew may not know the
    pronunciation, but I don't think these are the people you have in mind.

    On the other hand, if you are determined that these two graphically and
    semantically distinct entities should be encoded identically, then at
    least those of us who want or need to make a graphical or semantic
    distinction are not entirely stuffed i.e. left without a way ahead. For
    it does seem to be possible to determine algorithmically, though not
    entirely without ambiguity in some theoretical cases, which vav with
    holam is which - the only ambiguity would be in cases where the word
    before the vav with holam consists only of a string of vavs with dagesh
    of which the first may be a vowel (shuruq) or a consonant.

    >- the suggestion to encode Biblical Hebrew separately is unacceptable.
    I am glad to hear this clearly stated. I agree.

    >The requirements of professional and knowledgeable users, such as Biblical
    >scholars, should not be allowed to impose upon everyday users who are not
    >blessed with such a profound knowledge and understanding.
    Indeed. But also support for the special requirements of scholars should
    not be restricted just because it goes beyond the requirements of
    everyday users.

    >Consequently, it was suggested that the several issues with Biblical Hebrew
    >recently mentioned, and several more which were not, should be solved by
    >means of markup, outside the scope of Unicode. This is how they have been
    >addressed in many of the references given. This is our recommendation.
    What references are you referring to? Haralambous? I accept that markup
    may be suitable for the rare cases of enlarged, reduced, raised and
    broken letters which he mentions, as these are semantically the base
    letter plus some essentially extra-textual information. But markup is
    not appropriate for distinguishing between commonly occurring letters
    which are distinct semantically and phonetically, as well as very often
    graphically, like the different forms of vav with holam. Or is markup
    being suggested as a solution of the Yerushala(y)im issue? If so I fail
    to see how it addresses the problem, as markup does not inhibit

    >Failing that, it was suggested that an existing Unicode character, such as
    >ZERO WIDTH NO-BREAK SPACE, be used for "invisible" Hebrew letters, in cases
    >such as Yerushala(y)im.
    As there are many objections to ZWNBS, would CGJ be an acceptable
    alternative? But I do see why you might prefer to use a zero width base
    character here rather than a combining character, although that would
    not be appropriate for mittaxat in Exodus 20:4 and for right meteg.

    >The third, and least favored, option is to add a special Unicode character
    >to represent missing base characters such as the Yod in Yerushala(y)im.

    Peter Kirk

    This archive was generated by hypermail 2.1.5 : Tue Jul 29 2003 - 11:07:16 EDT