RE: Hebrew shaping (was RE: Benefits of Unicode)

From: Jonathan Rosenne (rosenne@qsm.co.il)
Date: Mon Feb 26 2001 - 07:13:24 EST


In Hebrew the exceptions are in abbreviations and foreign words, but they are
not so rare.

The most common ones are when the final Pe is hard, like in Philip.

We have been encoding Hebrew since the 1950's, on punch cards, and the decision
taken then for Hebrew was to have 5 extra letters for the final forms. Since
then, we have gathered a vast installed base of programs and data. See
http://www.qsm.co.il/Hebrew/HebKey.htm for the IBM 1401 version.

Jony

> -----Original Message-----
> From: Marco Cimarosti [mailto:marco.cimarosti@essetre.it]
> Sent: Monday, February 26, 2001 12:36 PM
> To: Unicode List
> Subject: Hebrew shaping (was RE: Benefits of Unicode)
>
>
> Sorry for coming back so late on an old issue (29 Jan 2001).
>
> I (Marco Cimarosti) wrote:
> > Each different positional form of a letter in Arabic, Syriac or Mongolian
> is
> > encoded with the same code point; the rendering engine must select the
> > proper form. The same problem in Greek and Hebrew has been addressed using
> > different code points for final and non-final letters, that must be
> > allocated to separate entries on the keyboard.
>
> Jonathan Rosenne replied:
> > Arabic and Hebrew are misleadingly similar in this respect.
> > While Arabic shaping is rather regular, Hebrew has too many exceptions,
> > making automatic shaping unsuitable.
>
> I tried to find out something on my own but had no success.
>
> All the Hebrew grammar books I have at home just say that the final form of
> letters is used at the end of words, full stop. But all my books are things
> like "Learn Hebrew yourself in two weeks", and my references about Yiddish
> are even more layman level.
>
> Can you describe these exceptions? How frequent are they? In which
> language(s) do they occur?
>
> I know that also the Arabic script sometimes deviates from its basic shaping
> rules (e.g. in abbreviations, in texts about grammar, and even in ordinary
> Farsi spelling), but these exceptions are rare enough that Unicode and other
> encoding systems preferred to address them with specialized layout controls
> (ZWJ, ZWNJ, TATWEEL). How is Hebrew different?
>
> Toda raba for any info.
>
> _ Marco
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT