Re: [hebrew] Re: Hebrew Issues

From: John Hudson (
Date: Sun Aug 24 2003 - 13:56:55 EDT

  • Next message: Peter Kirk: "Re: [hebrew] Re: Hebrew Issues"

    [Bcc'd to the SBL BibLit project discussion list.]

    At 03:13 PM 8/23/2003, Peter Kirk wrote:

    >> 2.2 Holam Alef

    >>Although the rules concerning this case are fairly straightforward, the
    >>rendering engine should not need to know so much grammar.

    >I'm a little surprised, Jony, that you came to this conclusion. It seems
    >to me that this one is a rendering issue. You have argued before that in
    >most typesetting this shift is not made. It has been demonstrated (in Ezra
    >SIL and SBL Hebrew with Uniscribe) that it is feasible for a rendering
    >engine to implement these rules, in the cases where this shift is required
    >for high quality e.g. biblical publications. The biblical text already
    >contains sufficient information to guide the rendering engine, except
    >possibly for a few special cases, and in the spirit of "thou shalt not add
    >thereto" I prefer not to do so when, as here, it is not absolutely necessary.

    I agree with Peter, it is not a problem for the rendering (in this case
    font lookups) to handle this holam repositioning contextually.

    >>A possible solution is to use ZWJ to indicate the shifting of the Holam
    >>forward. For example, Bet Dagesh Holam ZWJ Alef.
    >Agreed, if a mechanism is required. My preference is to use this encoding
    >only for special cases where the shift takes place as an exception to the
    >regular rules, and to use ZWNJ instead of ZWJ to inhibit such shifting in
    >cases where it is not required.

    Again, I agree:

             <bet, dagesh, holam, alef> = holam repositioned on alef

             <bet, dagesh, holam, ZWNJ, alef> = holam retained on bet

    >By the way, your example is not in canonical order (although it is in
    >logical order, see my comments on 2.8 below), and will be reordered to
    ><bet, holam, dagesh, ZWJ, alef>.

    Thankfully, this is one of the mark reordering cases that the font lookups
    can handle: we just need to make sure that the context is large enough for
    other marks to fall between the holam and the alef. However, this does
    raise the question of what happens to the ZWNJ in reordering

             <bet, dagesh, holam, ZWNJ, alef>

    If the holam ends up reordered before the dagesh, where does the ZWNJ end
    up? If it remains immediately in front of the alef, that's fine.

    >>For simpler cases, such as Yerushala(y)im, a zero width invisible base
    >>character could be used. Various possibilities had been discussed. CGJ is
    >>not appropriate because it is not a base character. ZWNBSP would have
    >>been suitable, except that it has been taken over by the BOM.
    >I fail to see a good reason not to use CGJ in such a case. The Unicode
    >distinction between a base character and a combining character is a
    >technical one which does not need to align perfectly with every user's

    I agree. I understand the logic in inserting an invisible base character in
    a place where readers 'know' there is a missing consonant, but the
    consonant *is* missing, it is not there and should not be there. CGJ works
    fine in this instance, because the only important thing to do is to make
    sure that the two vowels are not reordered.

    >>The medial Meteg in the Hataf vowels could be a rendering issue, a
    >>combining marks ligature. However, in this case we would need a CGNJ when
    >>a left Meteg is needed together with a Hataf.
    >In the absence of a CGNJ, and since CGJ does not have defined joining
    >properties despite its misleading name, I have suggested using CGJ for this.

    Since actual glyph ligation is occuring, the ZWNJ should be used to inhibit
    ligation. This is consistent with the Unicode 4.0 description of ZWJ and
    ZWNJ behaviour. A question remains, however: should medial meteg with hataf
    be the default rendering of <hataf..., meteg>, or should such ligation
    require <hataf..., ZWJ, meteg>? This is a rendering issue, but one which
    affects encoding: if one set of fonts treats ligation as default and
    another set doesn't, users will produce documents with conflicting encoding
    conventions depending on the rendering of the fonts they are using (one can
    even imagine a single document, set in multiple fonts, using different
    character sequences to obtain the same rendering). Personally, I favour
    having the medial meteg as default rendering for <hataf..., meteg>,
    requiring <hataf..., ZWNJ, meteg> in order to obtain a left meteg, because
    the medial meteg appears to be the most common positioning in the
    manuscript tradition.

    >>For the right Meteg, a new character is needed.

    >But I disagree that a new character is needed. This is essentially an
    >alternative positioning of the same combining character relative to other
    >combining characters with which it interferes typographically. This should
    >have been dealt with by appropriate allocation of combining classes. As it
    >was not, the appropriate mechanism seems to be to use CGJ to inhibit
    >canonical reordering. Thus my suggestion (= indicates canonical equivalence):
    >left meteg (non-hataf vowel): <vowel, meteg> = <meteg, vowel>
    >right meteg: <meteg, CGJ, vowel>
    >medial meteg (hataf vowel): <vowel, meteg> = <meteg, vowel>
    >left meteg (hataf vowel): <vowel, CGJ, meteg>

    I basically agree, with the following modification:

             left meteg (hataf vowel): <vowel, ZWNJ, meteg>

    Does this mean that we are agreed that the medial meteg rendering should be

    >> 2.9 Inverted Nun
    >>In the Bible there are a few cases of a special mark known as "Inverted
    >>Nun". It is probably not an inverted letter Nun, and requires its own
    >>character, HEBREW MARK INVERTED NUN.

    Agreed. Who wants to write the proposal? I have some good graphics showing
    various manuscript forms of this letter, clearly distinguished in form from
    the nun.

    >> 2.10 Extraordinary Points
    >>The SII encoded only the upper extraordinary point, as 05C4 HEBREW MARK
    >>UPPER DOT. A character for the lower dot could be added, although it
    >>appears only a few times.
    >Agreed. Although this latter character is rare, it is in regular and
    >undisputed use in a widely used text, and so probably does need to be encoded.

    I am content either to have the lower punctum encoded or to use a generic
    combining mark (U+0323), although the latter raises issues for multiscript
    fonts in applications that do not support writing system-specific glyph
    substitution (currently all applications). What I am most keen to have is a
    clear statement from the UTC identifying 05C4 HEBREW MARK UPPER DOT as the
    upper punctum, as Jony indicates was intended by SII, and specifying a
    codepoint for the Hebrew number / masoretic note dot, which requires its
    own glyph and cannot be harmonised with the upper punctum character. Again,
    this could mean a new Hebrew block character or U+0307 could be used.

    Note that until Jony's note on SII's intent, I had presumed U+05C4 to be
    the number / masoretic note dot, because of the absence of a corresponding
    lower mark to indicate that it was the upper punctum. Now I would like a
    definitive ruling from the UTC, to avoid future confusion.

    >> 2.12 Number Dots
    >>An old practice was to use dots and double dots above to distinguish "non
    >>words", such as numbers and acronyms. For several centuries this usage
    >>has been replaced by the use of Geresh and Gershayim.
    >>The dots always appear on unpointed texts. There is nothing special about
    >>them, so the existing Unicodes 0307 and 0308 could be used.

    Okay, that's fine with me, but I'd still like to see a note in the standard
    re. U+05C4.

    John Hudson

    Tiro Typeworks
    Vancouver, BC

    The sight of James Cox from the BBC's World at One,
    interviewing Robin Oakley, CNN's man in Europe,
    surrounded by a scrum of furiously scribbling print
    journalists will stand for some time as the apogee of
    media cannibalism.
                             - Emma Brockes, at the EU summit

    This archive was generated by hypermail 2.1.5 : Sun Aug 24 2003 - 14:47:37 EDT