RE: Yerushala(y)im - or Biblical Hebrew

From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Sun Jul 06 2003 - 19:15:54 EDT

Next message: Peter Kirk: "Re: Accented ij ligatures (and yery)"

Previous message: Tex Texin: "Re: The character for 10**24 in Japanese numbers (jo)"
Maybe in reply to: Jony Rosenne: "RE: Yerushala(y)im - or Biblical Hebrew"
Next in thread: John Hudson: "RE: Yerushala(y)im - or Biblical Hebrew"
Reply: John Hudson: "RE: Yerushala(y)im - or Biblical Hebrew"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter Constable wrote on Thu Jul 03 2003 - 11:52:52 EDT:

> Jony Rosenne wrote on 07/02/2003 05:55:02 AM:
>
> /> I would like to summarize my understanding: /
>
> I agree with you on most points, but would quibble on the first, as I
> find
> it overgeneralizes and is not explicit enough.
>
> /> 1. The sequence Lamed Patah Hiriq is invalid for Hebrew. It is
> invalid /
> in
> /> Hebrew to have two vowels for one letter. It may or may not be a
> valid /
> /> Unicode sequence, but there are many examples of valid Unicode
> sequences /
> /> that are invalid. /
>
> We need to state more carefully *what* is invalid. The facts are that
> spellings such as lamed patah hiriq *are* attested in literature and
> encoded representations are needed for them. These spellings are invalid
> as written representations of Hebrew that are consistent with Hebrew
> phonology; but their use in literature is not assumed to be consistent
> with Hebrew phonology; they are used *in spite of the fact* that they are
> inconsistent with Hebrew phonology. It is not normal for Hebrew spelling,
> but the literature to be encoded includes abnormal spellings, and they
> have as much need to be represented as the normal spellings.
>
> It appears to me that you are trying to establish invalidity of such
> sequences as a basis to argue that encoded representations should involve
> some character between the two vowels. I consider this reasoning flawed,
> however: the encoded representation is a representation of the *text*,
> not
> the phonology, and the text most certainly does include sequences such as
> lamed patah hiriq. It may be that we end up deciding to adopt an encoded
> representation for this that involves a character between the two vowels,
> but that is a technical-design choice, and not something that we are
> compelled to do because of the nature of the Hebrew language and normal
> conventions of Hebrew spelling.
>
> - Peter
>
> ---------------------------------------------------------------------------
>
> Peter Constable
>
> Non-Roman Script Initiative, SIL International
> 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
> Tel: +1 972 708 7485

Like Ted Hopp, I have been reading through the recent postings on
Hebrew, because I saw the proposal for encoding a separate set of
biblical Hebrew vowels and was seriously concerned by it. For ten years
until last year I was a member of SIL International, working with the
biblical Hebrew text, and regularly provided technical input to Peter
Constable and his colleagues on Hebrew and other non-Roman scripts.
Before joining SIL I was a software developer and served on ECMA
standards committees.

I have a couple of points to make now on this issue. First, it might
help to get an idea of the scale of the problem. In the WTS encoded text
of the BHS Hebrew Bible, which comes to 5.25 MB in UTF-8, so a million
or so vowel points, there are just 637 instances of two vowel points on
one consonant. Of these, 636 are the word Yerushala(y)im, in four
slightly different forms including two with the directional he suffix.
The one additional instance is in the word mittaxat in Exodus 20:4,
which has a double vowel for a rather different reason - alternative
pronunciations of the word. So I can make a good argument that it would
be less disruptive to change the encoding of these two words by, for
example, adding CGJ 637 times, rather than changing every one of the
million or so vowel points in the text. During an interim period before
software and fonts have been updated to match an update to the standard,
a text which is rendered incorrectly just 637 times in 5.25 MB would
clearly be much less problematic than one which is quite illegible
because the vowels in every word are unsupported.

Second, I think Jony's point would be understood better in the context
of the Ketiv and Qere phenomenon in the Hebrew Bible text. A proper
description of this would I suppose be too long for this list (but I
have just sent one in an off list message, so let me know off list if
you would like an edited copy of that). But what it means is that the
vowels in the word Yerushalaim were never really intended to go with the
consonants (Ketiv = written) around which they appear in the text; they
were intended to go with a different set of consonants (Qere = read
aloud) which were used in pronunciation. In this case the only
difference is that the Qere consonants include a yod before the final
mem, and this should be pronounced with the hireq vowel. I suppose the
question then arises of whether Unicode should encode what is actually
written on the paper or how the editor intended it to be understood. If
the former choice is made, there are actually quite a lot more anomalies
in the Hebrew Bible text which will have to be looked into, including
words with vowels but no consonants (e.g. in Ruth 3:17). If the latter,
then we have the option of encoding this with some kind of markup of the
same sort which will be necessary for other Ketiv/Qere pairs, i.e.
encoding alternative representations of the word, one being the Ketiv
consonants only and the other being the Qere consonants with the vowels.
This is the approach taken in the WTS encoding for most Ketiv/Qere
cases, where the Qere consonants are written in the margin, but not for
cases of "perpetual Qere" like Yerushala(y)im where the Qere consonants
are not written but are assumed to be known.

But for me the most telling argument against the recent proposal is that
it implies making an artificial division between biblical and modern
Hebrew. These are not separate languages with separate writing systems.
There has been a continuous written tradition from ancient times, and a
very clearly attested one at least from the time of the earliest
biblical and other manuscripts with vowel points, 10th century CE. (In
earlier texts only the consonants were written.) There is no sensible
place to make a division between the two encoding systems. Biblical and
other ancient texts are still in regular use by modern Hebrew speakers.
I have likened the situation to the use of Shakespeare and the King
James Bible in modern English. In both languages it would cause
considerable confusion, to say the least, to attempt to introduce
different encodings for the same letter forms in older and modern texts.

-- 
Peter Kirk
peter.r.kirk@ntlworld.com
http://web.onetel.net.uk/~peterkirk/

Next message: Peter Kirk: "Re: Accented ij ligatures (and yery)"
Previous message: Tex Texin: "Re: The character for 10**24 in Japanese numbers (jo)"
Maybe in reply to: Jony Rosenne: "RE: Yerushala(y)im - or Biblical Hebrew"
Next in thread: John Hudson: "RE: Yerushala(y)im - or Biblical Hebrew"
Reply: John Hudson: "RE: Yerushala(y)im - or Biblical Hebrew"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Jul 06 2003 - 19:55:11 EDT