Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

From: Kenneth Whistler (
Date: Thu Jun 26 2003 - 18:36:34 EDT

  • Next message: John Hudson: "Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)"

    Peter responded:

    > Ken Whistler wrote on 06/25/2003 06:57:56 PM:
    > > People could consider, for example, representation
    > > of the required sequence:
    > >
    > > <lamed, qamets, hiriq, final mem>
    > >
    > > as:
    > >
    > > <lamed, qamets, ZWJ, hiriq, final mem>
    > So, we want to introduce yet *another* distinct semantic for ZWJ?

    Actually, no, I don't. That was just the first candidate that
    came to mind.
    > We've
    > got one for Indic, another for Arabic, another for ligatures (similar to
    > that for Arabic, but slightly different). Now another that is "don't
    > affect any visual change, just be there to inhibit reordering under
    > canonical ordering / normalization"?

    As I pointed out in a separate response, just putting the ZWJ
    there would *already* interrupt the reodering of the sequence.
    There is nothing new about that. The problem is that you might
    not be able to count on it not effecting a visual change,
    because the generic meaning of ZWJ is now intended to be
    ligation requesting, which does have visual consequences.

    I now like better the suggestions of RLM or WJ for this. Both
    of those format controls, by *definition*, should have no
    impact on visual display in this context, the RLM because it
    would be inserted between two NSM's that pick up strong
    R-to-L directionality from the consonant, and the WJ
    because it would be inserted at a position where there already
    is no word/line break opportunity. But either of them,
    by their current definition and properties, would break the
    sequences for canonical reordering. So they already have
    the semantics of the putative new control in question: no
    effect on visual display, while inhibiting of the canonical
    reordering of the point sequence.

    > > The presence of a ZWJ (cc=0) in the sequence would block
    > > the canonical reordering of the sequence to hiriq before
    > > qamets. If that is the essence of the problem needing to
    > > be addressed, then this is a much simpler solution which would
    > > impact neither the stability of normalization nor require
    > > mass cloning of vowels in order to give them new combining
    > > classes.
    > Yes, it would accomplish all that; and is groanable kludge.

    Why is making use of the existing behavior of existing characters
    a "groanable kludge", if it has the desired effect and makes
    the required distinctions in text? If there is not some
    rendering system or font lookup showstopper here, I'm inclined
    to think it's a rather elegant way out of the problem.

    > At least with
    > having distinct vowel characters for Biblical Hebrew, we'd come to a point
    > we could forget about it, and wouldn't be wincing every time we considered
    > it.

    Au contraire. We'll be wincing forever for this one. There's
    no way of getting around the fact that this is merely a cloning
    of a the whole set of points in order to have candidates for
    a reassigned set of combining classes.

    You're stuck between a rock and a hard place on this one.

    The UTC cannot entertain merely fixing the existing combining
    class assignments, because it breaks the normalization stability
    guarantee. We've all come to acknowledge and most to accept that,
    even though it still elicits groans.

    But in the 10646 WG2 context, coming in with a duplicate set
    of Hebrew points is not going to make any sense, because, as
    someone (John Cowan?) has already pointed out, 10646 doesn't
    assign combining classes, and so trying to justify character
    cloning on the basis of distinct combining class assignments
    isn't going to make any sense there. You can always come in
    with the proposal to encode BIBLICAL HEBREW POINT PATAH and
    say, even though the glyph is identical, see, the name is
    different, so the character is different. But this is a pretty
    thin disguise, and is vulnerable to simple questioning:
    What is it for? Well, to point Biblical Hebrew texts. But
    what was U+05B7 HEBREW POINT PATAH for? Well, to point Biblical
    Hebrew texts (or any Hebrew text, for that matter...). Well,
    then, what is the difference? Uh, the combining classes for
    the two are different. What is a combining class? ... and
    so on.

    I'm trying to find a way, using existing characters and a
    simple set of text representational conventions, to make
    the distinctions and preserve the order relations that you
    need for decent font lookup, without the whole enterprise
    washing up on either of those two rocks.


    This archive was generated by hypermail 2.1.5 : Thu Jun 26 2003 - 19:12:03 EDT