Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

From: Kenneth Whistler (
Date: Fri Jun 27 2003 - 22:47:44 EDT

  • Next message: Philippe Verdy: "Re: Biblical Hebrew"

    Peter responded:
    > Kenneth Whistler wrote on 06/26/2003 05:36:34 PM:
    > > Why is making use of the existing behavior of existing characters
    > > a "groanable kludge", if it has the desired effect and makes
    > > the required distinctions in text?
    > Why is it a kludge to insert some cc=0 control character into the text for
    > the sole purpose of preventing reordering during canonical ordering of two
    > combining marks that do interact typographically and so should but
    > nevertheless do not have the same combining class; and, moreover, to do so
    > using a control character that was not created for that purpose?
    > The answer seems so obvious, I wouldn't know how to begin responding.

    And others apparently had the same feeling. But I contend that
    the reason this seems odd is because of the way you present
    it to yourself and others.

    It isn't a matter of "my text is o.k. the way I entered it, but
    now I have to insert some invisible control character into the
    text for the sole purpose of preventing reordering -- which wasn't
    something I wanted to have happen in the first place."

    Instead, it is that for Biblical Hebrew, the following textual
    conventions are adopted:

       A sequence of patah followed by hiriq is represented by
           <patah, CGJ, hiriq>
       A sequence of hiriq followed by patah is represented by
           <hiriq, CGJ, patah>
    Then you build keyboards (or other abstractions) that obey
    those textual conventions.

    You stop telling the Biblical Scholars that their text is
    screwed up because of Unicode and they have to "fix" it by
    inserting crazy control codes they don't know about, and
    chances are they will stop believing that their text is
    screwed up. :-)

    This isn't really any stranger than telling someone that for Twi, the
    following textual convention is adopted:

       An open o with an acute tone mark is represented by
           <open-o, combining acute>
    As long as the pieces stay firmly attached for entry, display,
    and searching, everybody is happy and nobody needs to be
    the wiser about what gimmicks the programmers are
    using under the covers.

    And why should it be any stranger that maintenance of vowel
    point order in Biblical Hebrew cases with multiple points
    requires judicious use of an invisible combining mark like CGJ,
    when maintenance of visible directional layout distinctions
    for any Hebrew requires a boatload of invisible format controls?

    > If we want to insert a control character to prevent reordering under
    > canonical ordering, I think it would be preferable to create a new control
    > character for just that purpose:

    How would that be less of a kludge? I contend that inventing
    another invisible character *just* to do this is even more of
    a kludge than what I have suggested, when use of an existing
    character already has the desired effect.

    The end effect of the impulse you are describing here would
    be an attempt to create atomistic controls for each conceivable
    text effect, and I think the UTC has already given up on
    heading that direction. It is already bad enough trying
    to keep straight all the possible interactions for the ones
    already created, as demonstrated by the discoveries we just
    made when trying to consider what happens if a ZWJ gets
    plunked down *between* two combining marks.

    > that would give a character that could be
    > used elsewhere for the very same purpose without needing to worry about
    > what unanticipated and undesirable effects might result by hijacking a
    > control created for some completely unrelated purpose.

    This was a more applicable criticism for the suggestions of RLM,
    ZWJ, or WJ, since their very status as format controls instead
    of as combining marks had undesirable effects on the combining
    character sequences in question. I don't think the criticism applies
    to CGJ, however, since that character doesn't have any
    defined behavior other than what is needed here. And, as I
    indicated in a separate response, I do not think using CGJ
    for the purpose described in Biblical Hebrew is unrelated to
    its intent. It is just that nobody had yet thought through a
    scenario where it would prove useful between combining marks.


    This archive was generated by hypermail 2.1.5 : Fri Jun 27 2003 - 23:28:19 EDT