Hebrew composition model, with cantillation marks

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Oct 28 2003 - 13:04:09 CST


This is a separate issue (not strictly related to combining classes or
order), related to the current content of the description of the Hebrew
script in the Unicode reference (chapter 8.1)

I see that the description includes the following text:

[quote]
When points and marks are located below the same base letter, the point
always comes first
(on the right) and the mark after it (on the left), except for the marks
yetiv, U+059A HEBREW ACCENT YETIV, and dehi, U+05AD HEBREW ACCENT DEHI,
which come first (on the right) and are followed (on the left) by the point.
These rules are followed when points and marks are located above the same
base letter:
. If the point is holam, all cantillation marks precede it (on the right),
except pashta, U+0599 HEBREW ACCENT PASHTA.
. Pashta always follows (goes to the left of) points.
. Holam on a sin consonant (shin base + sin dot) follows (goes to the left
of) the sin dot. However, the two combining marks are sometimes rendered as
a single assimilated dot.
. Shin dot and sin dot are generally represented closer vertically to the
base letter than other points and marks that go above it.
[/quote]

All this seems quite verbose. In fact these rules could be more easily
modeled by assigning positioning properties for all Hebrew points, vowels,
accents and marks. If you look at the above rules, the exceptions come
directly from the positioning of the diacritic, which is among these:

1) In the consonnants group:
1.1) The base consonnant (from LETTER ALEF to LETTER TAV, and their variants
or ligatures) is positioned: central
1.2) The pronunciation modifier (currently only U+5BC POINT DAGESH) is also
positioned: central
1.3) The consonnant modifiers (U+05C1 POINT SHIN DOT and U+05C2 POINT SIN
DOT) are positioned respectively: above-right, above-left

2) In the vowels group:
2.1) The vowel points (from U+5B0 POINT SHEVA to U+05BB POINT QUBUTS) are
all positioned: below, except U+5B8 POINT HOLAM which is positioned:
above-left.
2.2) The vowel modifiers U+05BD POINT METEG, and U+5BF POINT RAFE (and its
variant U+FB1E JUDEO-SPANISH VARIKA) are positioned respectively: below and
above (above)

3) Cantillation accents (from U+0591 ACCENT ETNATHA to U+05AE ACCENT ZINOR)
and (or?) marks (U+05AF MARK MASURA CIRCLE and U+05C4 MARK UPPER DOT)
They are in of the 6 following positioning categories:
- above:
    for most of them, excepting these listed below;
- above-right:
    U+059D ACCENT GERESH MUQDAM,
    U+05A0 ACCENT TELISHA GEDOLA,
- above-left:
    U+0599 ACCENT PASHTA,
    U+05A1 ACCENT PAZER,
    U+05A9 ACCENT TELISHA QETANA;
- below-right:
    U+059A ACCENT YETIV,
    U+05AD ACCENT DEHI;
- below:
    U+0591 ACCENT ETNAHTA,
    U+0596 ACCENT TIPEHA (tarha),
    U+059B ACCENT TEVIR,
    U+05A3 ACCENT MUNAH,
    U+05A4 ACCENT MAHAPAKH,
    U+05A5 ACCENT MERKHA,
    U+05A6 ACCENT MERKHA KEFULA,
    U+05A7 ACCENT DARGA,
    U+05A9 ACCENT YERAH BEN YOMO;
- below-left:
    U+05AE ACCENT ZINOR;

If you look at the above quoted rule, you'll see that these exceptions are
not really exceptions if you just consider how each of the 6 positioning
category (plus the central position for base letters and dagesh) interact.

Normally, these categoriers should have matched with a combining class, (but
this has not been defined this way, using the composition groups 1.1 1.2 1.3
2.1 2.2 and 3 described above).

Could such positioning category become a normative property of Hebrew
points, vowels, accents and marks ?



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST