Re: [hebrew] Re: Hebrew Issues

From: Peter Kirk (
Date: Sun Aug 24 2003 - 17:36:00 EDT

  • Next message: John Hudson: "Re: [hebrew] Re: Hebrew Issues"

    On 24/08/2003 10:56, John Hudson wrote:

    > ... However, this does raise the question of what happens to the ZWNJ
    > in reordering
    > <bet, dagesh, holam, ZWNJ, alef>
    > If the holam ends up reordered before the dagesh, where does the ZWNJ
    > end up? If it remains immediately in front of the alef, that's fine.

    ZWNJ is not a combining character and so is unaffected by canonical
    reordering. Combining characters can never move from before it to after
    it or vice versa. Although CGJ is a combining character, it has the same
    effect on ordering as ZWNJ as its combining class is zero.

    > ...
    >> In the absence of a CGNJ, and since CGJ does not have defined joining
    >> properties despite its misleading name, I have suggested using CGJ
    >> for this.
    > Since actual glyph ligation is occuring, the ZWNJ should be used to
    > inhibit ligation. This is consistent with the Unicode 4.0 description
    > of ZWJ and ZWNJ behaviour. ...

    But this is where the problem comes. Because ZWJ and ZWNJ are not
    combining characters, they (theoretically, though not necessarily in
    your implementation) break the combining character sequence and so the
    link between the combining characters which follow it and the base
    character. In fact the following combining characters become a defective
    combining sequence whose rendering is undefined. I think MS Word
    currently inserts a dotted circle in this case, and this is conformant
    behaviour in the case of a defective combining sequence.

    Is this correct, anyone, or am I overstating my case? Actually ZWJ is
    theoretically less of a problem because it does specify a ligature
    between the preceding and following combining character sequences. But
    ZWNJ specifies that they should be rendered separately.

    > ... A question remains, however: should medial meteg with hataf be the
    > default rendering of <hataf..., meteg>, or should such ligation
    > require <hataf..., ZWJ, meteg>? This is a rendering issue, but one
    > which affects encoding: if one set of fonts treats ligation as default
    > and another set doesn't, users will produce documents with conflicting
    > encoding conventions depending on the rendering of the fonts they are
    > using (one can even imagine a single document, set in multiple fonts,
    > using different character sequences to obtain the same rendering).
    > Personally, I favour having the medial meteg as default rendering for
    > <hataf..., meteg>, requiring <hataf..., ZWNJ, meteg> in order to
    > obtain a left meteg, because the medial meteg appears to be the most
    > common positioning in the manuscript tradition.

    If we do use ZWJ/ZWNJ, and based on the principle in the standard (TUS
    4.0 pp. 389-390) "These characters are not to be used in all cases where
    ligatures or cursive connections are desired; instead, they are only for
    overriding the
    normal behavior of the text", I would suggest that <hataf, meteg> should
    be rendered according to the font default which may vary (medial for a
    font based on BHS, left meteg for a font based on an edition in which
    this is the default); <hataf, ZWJ, meteg> should be used to prefer
    medial despite the default (not sure if this is ever required); and
    <hataf, ZWNJ, meteg> to inhibit medial when this must not be used (as
    in a few cases in BHS).

    > ...
    >> ... Thus my suggestion (= indicates canonical equivalence):
    >> left meteg (non-hataf vowel): <vowel, meteg> = <meteg, vowel>
    >> right meteg: <meteg, CGJ, vowel>
    >> medial meteg (hataf vowel): <vowel, meteg> = <meteg, vowel>
    >> left meteg (hataf vowel): <vowel, CGJ, meteg>
    > I basically agree, with the following modification:
    > left meteg (hataf vowel): <vowel, ZWNJ, meteg>

    See the reasons above for not using this.

    > Does this mean that we are agreed that the medial meteg rendering
    > should be normative?

    I am not intending to say that. I want to say that it can be the default
    for a particular font or perhaps a font level attribute. Other fonts
    might have left meteg as the default with hatafs and no medial meteg
    glyphs; in that case the CGJ or ZWNJ would be ignored. Or they might
    have left meteg as the default but also have medial meteg glyphs, in
    which case a different mechanism would be required to request use of the
    medial meteg, perhaps with ZWJ.

    So here is a more nuanced version of my suggestion:

    left meteg (non-hataf vowel): <vowel, meteg> = <meteg, vowel>
    right meteg: <meteg, CGJ, vowel>
    font's default position of meteg (hataf vowel): <vowel, meteg> = <meteg,
    medial meteg (hataf vowel) (if supported by the font): TBD (<vowel, ZWJ,
    meteg> ???)
    left meteg (hataf vowel): <vowel, CGJ, meteg>

    > ...
    >>> 2.10 Extraordinary Points
    >>> The SII encoded only the upper extraordinary point, as 05C4 HEBREW
    >>> MARK UPPER DOT. A character for the lower dot could be added,
    >>> although it appears only a few times.
    >> Agreed. Although this latter character is rare, it is in regular and
    >> undisputed use in a widely used text, and so probably does need to be
    >> encoded.
    > I am content either to have the lower punctum encoded or to use a
    > generic combining mark (U+0323), although the latter raises issues for
    > multiscript fonts in applications that do not support writing
    > system-specific glyph substitution (currently all applications). ...

    Presumably a font could be programmed to substitute a glyph based on
    context, especially for a combining mark where it would be relatively
    simple to determine that the base character is in the Hebrew block and
    so the Hebrew glyph variant is required. No help of course if you want
    an isolated diacritic or a Qere without Ketiv form.

    > ... What I am most keen to have is a clear statement from the UTC
    > identifying 05C4 HEBREW MARK UPPER DOT as the upper punctum, as Jony
    > indicates was intended by SII, and specifying a codepoint for the
    > Hebrew number / masoretic note dot, which requires its own glyph and
    > cannot be harmonised with the upper punctum character. Again, this
    > could mean a new Hebrew block character or U+0307 could be used.
    > Note that until Jony's note on SII's intent, I had presumed U+05C4 to
    > be the number / masoretic note dot, because of the absence of a
    > corresponding lower mark to indicate that it was the upper punctum.
    > Now I would like a definitive ruling from the UTC, to avoid future
    > confusion.

    Agreed. Notes should be added to the code charts for U+05C4, e.g. "=
    upper punctum extraordinarium", and for U+0307 e.g. "= Hebrew number
    dot", each with pointers to the other.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Sun Aug 24 2003 - 18:31:20 EDT