Re: [hebrew] Re: Hebrew Issues

From: John Hudson (
Date: Sun Aug 24 2003 - 20:22:40 EDT

  • Next message: John Cowan: "Re: [hebrew] Re: Hebrew Issues"

    At 02:36 PM 8/24/2003, Peter Kirk wrote:

    >>Since actual glyph ligation is occuring, the ZWNJ should be used to
    >>inhibit ligation. This is consistent with the Unicode 4.0 description of
    >>ZWJ and ZWNJ behaviour. ...
    >But this is where the problem comes. Because ZWJ and ZWNJ are not
    >combining characters, they (theoretically, though not necessarily in your
    >implementation) break the combining character sequence and so the link
    >between the combining characters which follow it and the base character.
    >In fact the following combining characters become a defective combining
    >sequence whose rendering is undefined. I think MS Word currently inserts a
    >dotted circle in this case, and this is conformant behaviour in the case
    >of a defective combining sequence.
    >Is this correct, anyone, or am I overstating my case? Actually ZWJ is
    >theoretically less of a problem because it does specify a ligature between
    >the preceding and following combining character sequences. But ZWNJ
    >specifies that they should be rendered separately.

    Don't be too concerned about what happens in Word. There are known bugs
    that affect Biblical Hebrew, and there are known problems with using
    control characters in some circumstances. However, the issue regarding
    insertion of ZWJ and ZWNJ between combining marks needs clarification: word
    from Paul Nelson at MS is that it definitely should be possible to insert
    these characters between combining marks and so to affect the relationship
    of the marks on either side of the control character, i.e. not breaking the
    combining of the mark(s) following the control character with the preceding
    base character. This does, however, require glyph-space processing of the
    control characters.

    >>... A question remains, however: should medial meteg with hataf be the
    >>default rendering of <hataf..., meteg>, or should such ligation require
    >><hataf..., ZWJ, meteg>? This is a rendering issue, but one which affects
    >>encoding: if one set of fonts treats ligation as default and another set
    >>doesn't, users will produce documents with conflicting encoding
    >>conventions depending on the rendering of the fonts they are using (one
    >>can even imagine a single document, set in multiple fonts, using
    >>different character sequences to obtain the same rendering). Personally,
    >>I favour having the medial meteg as default rendering for <hataf...,
    >>meteg>, requiring <hataf..., ZWNJ, meteg> in order to obtain a left
    >>meteg, because the medial meteg appears to be the most common positioning
    >>in the manuscript tradition.
    >If we do use ZWJ/ZWNJ, and based on the principle in the standard (TUS 4.0
    >pp. 389-390) "These characters are not to be used in all cases where
    >ligatures or cursive connections are desired; instead, they are only for
    >overriding the
    >normal behavior of the text", I would suggest that <hataf, meteg> should
    >be rendered according to the font default which may vary (medial for a
    >font based on BHS, left meteg for a font based on an edition in which this
    >is the default); <hataf, ZWJ, meteg> should be used to prefer medial
    >despite the default (not sure if this is ever required); and <hataf,
    >ZWNJ, meteg> to inhibit medial when this must not be used (as in a few
    >cases in BHS).

    Okay, so we have:

    <hataf, meteg> = variable rendering depending on font
    <hataf, ZWJ, meteg> = always medial ligated form
    <hataf, ZWNJ, meteg> = always left meteg (post hataf)
    <meteg, CGJ, hataf> = always right meteg (pre hataf)

    I'm reasonably comfortable with that, but it suggests that authors and
    editors producing electronic documents, e.g. for web publishing, should
    always expressly encode their preference using ZWJ and ZWNJ, since they
    can't always or reliably determine what font will be used to display the text.

    >>>... Thus my suggestion (= indicates canonical equivalence):
    >>>left meteg (non-hataf vowel): <vowel, meteg> = <meteg, vowel>
    >>>right meteg: <meteg, CGJ, vowel>
    >>>medial meteg (hataf vowel): <vowel, meteg> = <meteg, vowel>
    >>>left meteg (hataf vowel): <vowel, CGJ, meteg>
    >>I basically agree, with the following modification:
    >> left meteg (hataf vowel): <vowel, ZWNJ, meteg>
    >See the reasons above for not using this.
    >>Does this mean that we are agreed that the medial meteg rendering should
    >>be normative?
    >I am not intending to say that. I want to say that it can be the default
    >for a particular font or perhaps a font level attribute. Other fonts might
    >have left meteg as the default with hatafs and no medial meteg glyphs; in
    >that case the CGJ or ZWNJ would be ignored. Or they might have left meteg
    >as the default but also have medial meteg glyphs, in which case a
    >different mechanism would be required to request use of the medial meteg,
    >perhaps with ZWJ.
    >So here is a more nuanced version of my suggestion:
    >left meteg (non-hataf vowel): <vowel, meteg> = <meteg, vowel>
    >right meteg: <meteg, CGJ, vowel>
    >font's default position of meteg (hataf vowel): <vowel, meteg> = <meteg,
    >medial meteg (hataf vowel) (if supported by the font): TBD (<vowel, ZWJ,
    >meteg> ???)
    >left meteg (hataf vowel): <vowel, CGJ, meteg>
    >>>> 2.10 Extraordinary Points
    >>>>The SII encoded only the upper extraordinary point, as 05C4 HEBREW MARK
    >>>>UPPER DOT. A character for the lower dot could be added, although it
    >>>>appears only a few times.
    >>>Agreed. Although this latter character is rare, it is in regular and
    >>>undisputed use in a widely used text, and so probably does need to be encoded.
    >>I am content either to have the lower punctum encoded or to use a generic
    >>combining mark (U+0323), although the latter raises issues for
    >>multiscript fonts in applications that do not support writing
    >>system-specific glyph substitution (currently all applications). ...
    >Presumably a font could be programmed to substitute a glyph based on
    >context, especially for a combining mark where it would be relatively
    >simple to determine that the base character is in the Hebrew block and so
    >the Hebrew glyph variant is required. No help of course if you want an
    >isolated diacritic or a Qere without Ketiv form.

    Yes, this is possible, although the OpenType architecture is designed to
    deal with exactly this kind of language-specific substitution without
    needing to use glyph context, using the Language System tag and the
    Localised Forms <locl> layout feature. So I'd consider the glyph context
    approach to be a hack for apps that are not aware of Language System tags
    or don't process <locl>.

    >>... What I am most keen to have is a clear statement from the UTC
    >>identifying 05C4 HEBREW MARK UPPER DOT as the upper punctum, as Jony
    >>indicates was intended by SII, and specifying a codepoint for the Hebrew
    >>number / masoretic note dot, which requires its own glyph and cannot be
    >>harmonised with the upper punctum character. Again, this could mean a new
    >>Hebrew block character or U+0307 could be used.
    >>Note that until Jony's note on SII's intent, I had presumed U+05C4 to be
    >>the number / masoretic note dot, because of the absence of a
    >>corresponding lower mark to indicate that it was the upper punctum. Now I
    >>would like a definitive ruling from the UTC, to avoid future confusion.
    >Agreed. Notes should be added to the code charts for U+05C4, e.g. "= upper
    >punctum extraordinarium", and for U+0307 e.g. "= Hebrew number dot", each
    >with pointers to the other.

    A question for Ken Whistler, if he is still following this: since Jony hgas
    indicated that SII intended U+05C4 for the upper punctum extraordinarium,
    is this sufficient for the editors of the standard to make a clarification
    in the text without a decision from the UTC? Even though this reverses my
    own interpretation of this character, I'm most keen to see a speedy resolution.

    John Hudson

    Tiro Typeworks
    Vancouver, BC

    The sight of James Cox from the BBC's World at One,
    interviewing Robin Oakley, CNN's man in Europe,
    surrounded by a scrum of furiously scribbling print
    journalists will stand for some time as the apogee of
    media cannibalism.
                             - Emma Brockes, at the EU summit

    This archive was generated by hypermail 2.1.5 : Sun Aug 24 2003 - 21:11:43 EDT