RE: Yerushala(y)im - or Biblical Hebrew

From: Peter Kirk (
Date: Sun Jul 06 2003 - 19:15:54 EDT

  • Next message: Peter Kirk: "Re: Accented ij ligatures (and yery)"

    Peter Constable wrote on Thu Jul 03 2003 - 11:52:52 EDT:

    > Jony Rosenne wrote on 07/02/2003 05:55:02 AM:
    > /> I would like to summarize my understanding: /
    > I agree with you on most points, but would quibble on the first, as I
    > find
    > it overgeneralizes and is not explicit enough.
    > /> 1. The sequence Lamed Patah Hiriq is invalid for Hebrew. It is
    > invalid /
    > in
    > /> Hebrew to have two vowels for one letter. It may or may not be a
    > valid /
    > /> Unicode sequence, but there are many examples of valid Unicode
    > sequences /
    > /> that are invalid. /
    > We need to state more carefully *what* is invalid. The facts are that
    > spellings such as lamed patah hiriq *are* attested in literature and
    > encoded representations are needed for them. These spellings are invalid
    > as written representations of Hebrew that are consistent with Hebrew
    > phonology; but their use in literature is not assumed to be consistent
    > with Hebrew phonology; they are used *in spite of the fact* that they are
    > inconsistent with Hebrew phonology. It is not normal for Hebrew spelling,
    > but the literature to be encoded includes abnormal spellings, and they
    > have as much need to be represented as the normal spellings.
    > It appears to me that you are trying to establish invalidity of such
    > sequences as a basis to argue that encoded representations should involve
    > some character between the two vowels. I consider this reasoning flawed,
    > however: the encoded representation is a representation of the *text*,
    > not
    > the phonology, and the text most certainly does include sequences such as
    > lamed patah hiriq. It may be that we end up deciding to adopt an encoded
    > representation for this that involves a character between the two vowels,
    > but that is a technical-design choice, and not something that we are
    > compelled to do because of the nature of the Hebrew language and normal
    > conventions of Hebrew spelling.
    > - Peter
    > ---------------------------------------------------------------------------
    > Peter Constable
    > Non-Roman Script Initiative, SIL International
    > 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
    > Tel: +1 972 708 7485

    Like Ted Hopp, I have been reading through the recent postings on
    Hebrew, because I saw the proposal for encoding a separate set of
    biblical Hebrew vowels and was seriously concerned by it. For ten years
    until last year I was a member of SIL International, working with the
    biblical Hebrew text, and regularly provided technical input to Peter
    Constable and his colleagues on Hebrew and other non-Roman scripts.
    Before joining SIL I was a software developer and served on ECMA
    standards committees.

    I have a couple of points to make now on this issue. First, it might
    help to get an idea of the scale of the problem. In the WTS encoded text
    of the BHS Hebrew Bible, which comes to 5.25 MB in UTF-8, so a million
    or so vowel points, there are just 637 instances of two vowel points on
    one consonant. Of these, 636 are the word Yerushala(y)im, in four
    slightly different forms including two with the directional he suffix.
    The one additional instance is in the word mittaxat in Exodus 20:4,
    which has a double vowel for a rather different reason - alternative
    pronunciations of the word. So I can make a good argument that it would
    be less disruptive to change the encoding of these two words by, for
    example, adding CGJ 637 times, rather than changing every one of the
    million or so vowel points in the text. During an interim period before
    software and fonts have been updated to match an update to the standard,
    a text which is rendered incorrectly just 637 times in 5.25 MB would
    clearly be much less problematic than one which is quite illegible
    because the vowels in every word are unsupported.

    Second, I think Jony's point would be understood better in the context
    of the Ketiv and Qere phenomenon in the Hebrew Bible text. A proper
    description of this would I suppose be too long for this list (but I
    have just sent one in an off list message, so let me know off list if
    you would like an edited copy of that). But what it means is that the
    vowels in the word Yerushalaim were never really intended to go with the
    consonants (Ketiv = written) around which they appear in the text; they
    were intended to go with a different set of consonants (Qere = read
    aloud) which were used in pronunciation. In this case the only
    difference is that the Qere consonants include a yod before the final
    mem, and this should be pronounced with the hireq vowel. I suppose the
    question then arises of whether Unicode should encode what is actually
    written on the paper or how the editor intended it to be understood. If
    the former choice is made, there are actually quite a lot more anomalies
    in the Hebrew Bible text which will have to be looked into, including
    words with vowels but no consonants (e.g. in Ruth 3:17). If the latter,
    then we have the option of encoding this with some kind of markup of the
    same sort which will be necessary for other Ketiv/Qere pairs, i.e.
    encoding alternative representations of the word, one being the Ketiv
    consonants only and the other being the Qere consonants with the vowels.
    This is the approach taken in the WTS encoding for most Ketiv/Qere
    cases, where the Qere consonants are written in the margin, but not for
    cases of "perpetual Qere" like Yerushala(y)im where the Qere consonants
    are not written but are assumed to be known.

    But for me the most telling argument against the recent proposal is that
    it implies making an artificial division between biblical and modern
    Hebrew. These are not separate languages with separate writing systems.
    There has been a continuous written tradition from ancient times, and a
    very clearly attested one at least from the time of the earliest
    biblical and other manuscripts with vowel points, 10th century CE. (In
    earlier texts only the consonants were written.) There is no sensible
    place to make a division between the two encoding systems. Biblical and
    other ancient texts are still in regular use by modern Hebrew speakers.
    I have likened the situation to the use of Shakespeare and the King
    James Bible in modern English. In both languages it would cause
    considerable confusion, to say the least, to attempt to introduce
    different encodings for the same letter forms in older and modern texts.

    Peter Kirk

    This archive was generated by hypermail 2.1.5 : Sun Jul 06 2003 - 19:55:11 EDT