Re: [hebrew] Re: Hebrew composition model, with cantillation marks

From: Peter Kirk (
Date: Wed Oct 29 2003 - 15:10:55 CST

On 29/10/2003 10:26, Philippe Verdy wrote:

> ...
>The problem I see here is that ZWJ is not intended to create ligatures
>between diacritics, only between clusters that would otherwise still be a
>single combining sequence.
>Normally CGJ would have fitted better there, but this conflicts with the
>intent to address the canonical combining order with CGJ.
>In the proposal, ...
I'm not sure if this is a good way to describe this proposal. It was a
joint proposal from several people, including Peter Constable who was
then with SIL International, and was put on the SIL web site for
convenience. Peter and others of the original proposers have had second
thoughts about it. Don't assume that it represents the current thinking
of SIL or of anyone else.

>... the medial meteg is missing, but not the right and
>left meteg, as they are encoded within the same class and their order is
>preserved when attached to a vowel.
>Logically, ...
Graphically, yes. In origin, probably. But I know that Unicode has
rejected any notion of these hataf vowels being combinations, even at
the compatibility level, between sheva and another vowel.

>... the hataf vowel is made of two parts: hataf and a second vowel,
>and the medial meteg is put in the middle. There are two solutions:
>- either encode <hataf, meteg, second part of the vowel> with the
>proposed new biblical vowels, which all belong to the same class
In view of the above this would probably not be acceptable.

>- or add a medial meteg that combines and modifies the hataf vowel, and will
>be normally coded after that vowel.
Or use some kind of character, existing or new, to promote ligation
between the vowel and the meteg.

>As the proposal also keep cantillation marks in the same combining
>class 220 as vowels and meteg, the order will be significant, as the medial
>meteg must combine with the hataf vowel, not with the cantillation mark.
Requests to ligate meteg with any other mark would simply be ignored, in
the same way as ZWJ is ignored when between base characters that cannot
be ligated. While it is obviously not good to sprinkle text with
superfluous ligation marks, CGJs etc, these need not be made illegal or
automatically removed any more than it is neccessary to remove
superfluous ZWJs between characters which cannot be ligated. As the
characters are default ignorable, applications and renderers should
simply ignore them when they are superfluous - but they should not
delete them as no one application can be sure which character pairs
actually have ligatures in any particular font.

>This is not a problem because other cantillation marks that combine below
>are not separated in two halves like hataf vowels. This means that no medial
>meteg could occur within a cantillation mark and so only a normal meteg
>could eventually occur; but this causes a rendering problem if this is not
>normalized directly on input, as no NF form will not reorder them.
>So the proposal PDF leaves open the choice of the combining class to
>use for the new vowels and meteg that combine below, and they could be given
>class 28 as well, allowing the new meteg to be reordered before cantillation
>So I do think that the new vowels and meteg proposed by should not
>be given the same class 220 as cantillation marks that should be reordered
>after all vowels and meteg, and that a class 28 for them would be
>preferable, unless there is some proof that vowels or meteg can follow
>cantillation marks (meaning that there would be a second logical vowel group
>on the same consonnant, and in that case we still have a problem because not
>all cantillation marks share the same class 220).
But there is such proof. A few days ago I posted some text describing
how meteg and certain other low accents (class 220 ones, fortunately)
occur together and in both orders. Also there are cases of
vowel-accent-vowel in that order below a single base character, see section
3.2. So the best thing is to put every low mark into class 220, except
for the two (dehi and yetiv) which are always positioned to the right.

>Here again the proposal does not solve all, and there persists the
>need to encode a ignorable control with class 0 to separate two vowel groups
>applied to the same consonnant group (I all a "consonnant group" the Unicode
>sequence made of: a single base consonnant letter, with a optional sin/shin
>dot above right or left, and a optional dagesh/rafe/varika point inside or
>centered above).
>That's why the choice between 220 and 28 classes in the proposal is
>not important.
As I see it, the " proposal" with class 220 for all low marks
except dehi and yetiv does successfully do away with the need for a CGJ
type control. The objections to this proposal are of a quite different type.

Peter Kirk (personal) (work)

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST