Re: Major Defect in Combining Classes of Tibetan Vowels

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jun 25 2003 - 20:43:50 EDT

  • Next message: Kenneth Whistler: "Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)"

    John Hudson wrote:

    > This idea of Hebrew vowels as 'fixed' marks is problematical, because in
    > Biblical Hebrew they are not fixed: they move relative to additional marks
    > (other vowels or cantillation marks).
    >
    > >It may be more *difficult* for applications to do correct rendering,
    > >but there was never any intention in the standard that I know
    > >of that a sequence <hiriq, patah> would render differently
    > >than a sequence <patah, hiriq>.
    >
    > Yes, this is what I am saying is wrong: <hiriq, patah> *should* render
    > differently from <patah, hiriq>. This example is particularly important,
    > because it occurs in the spelling of yerushalaim, the Masoretic
    > approximation of yerushalayim. Correct rendering requires that the hiriq
    > follows the patah, and not vice versa.

    Understood. See my separate response on the Biblical Hebrew thread.

    > They are necessary to render Biblical Hebrew text correctly using current
    > font and layout engine technologies. These technologies work perfectly for
    > Biblical Hebrew so long as Unicode canonical ordering is ignored. I think
    > there is very little impetus to change or develop new implementations to
    > take into account what strikes most of those involved with Biblical Hebrew
    > text processing as an error in Unicode.

    "so long as Unicode canonical ordering is ignored". But as you
    and Peter point out, you cannot actually ignore canonical
    ordering, since in the Internet context it is outside of
    the end user's control. Once text escapes your own system
    for interchange, it may be subject to normalization, and you
    are kaputt.

    As stated, this is also turning into a typical--dare I say, religious--
    confrontation of "I'm right and you're wrong" with no compromise
    in prospect and people getting ready to shoot themselves in the
    foot to prove they are right.

    You say there is little impetus to change or develop new implementations,
    and yet the very solutions being proposed, e.g., by Peter, would
    force reencoding of all the Biblical Hebrew text to work at all,
    and would, ipso facto, require new implementations and new fonts
    to work right.

    The alternative I suggested, of agreeing on a text representational
    convention of <vowel, ZWJ, vowel> for those instances of sequences
    which should not reorder could be implemented *now* with
    existing characters, and only minor extensions to the fonts and
    to keyboard methods. Any existing corpus could be updated
    en masse (and more easily than switching over to Peter's scheme),
    or incrementally, as appropriate.

    The other alternative that some seem to prefer: just change the
    combining classes and be done with it -- is *not* going to
    happen. It would fly in the face of politically committed
    stability guarantees by the UTC and required by the IETF and
    W3C. An inconvenience for Biblical Hebrew implementations is
    not going to outweigh that, for any of the committees involved.
    And even, if by some miracle, it *were* to happen, you would
    also be awaiting the rollout of new implementations, since
    you'd have to wait through the chaotic transition while everyone
    updated their normalization algorithms.

    Just picking up the marbles and going home isn't an option,
    either. As you indicate, "so long as Unicode canonical ordering
    is ignored" the existing layout technologies work just fine.
    So address the problem with an appropriate fix. Insert a
    ZWJ (for instance) at the point where the canonical reordering
    needs to be blocked on a vowel sequence, and you are then in
    a situation where even though you are not ignoring canonical
    ordering (which in distributed systems you cannot), you
    end up preserving the order you need, anyway.

    > I've spent nine months working on Biblical Hebrew rendering for the major
    > user community (the Society of Biblical Literature and their Font
    > Foundation partners), and their take on this is that a) they want a
    > solution that works with today's technology, and b) they will avoid Unicode
    > canonical ordering like the plague and use custom normalisations instead.

    And how is implementing a custom normalization not a matter of
    "developing a new implementation"? It doesn't even begin to
    deal with the problem of what happens if the text "escapes" out
    into the Internet context, which won't be using the same
    custom normalization.

    Implementing a "custom" text representational convention seems
    like a much more straightforward task to me.

    > When we conducted normalisation tests, switching from Unicode normalisation
    > of to a custom normalisation that does not re-order vowels or meteg*, we
    > increased the number of unique consonant + mark(s) sequences encoded in the
    > Old Testament text by more 340. This means that Unicode normalisation was
    > creating 340 textual ambiguities by treating lexically distinct sequences
    > as canonically equivalent. I don't think that kind of textual ambiguity is
    > 'overblown'.

    Introduce a canonical reordering blocker (cc=0) into the textual
    sequences which get ordered in ways that lead to textual ambiguities,
    and the textual ambiguities should go away.

    >
    > * Meteg re-ordering is in some respects even more problematic than
    > multi-vowel re-ordering; certainly it is a more common problem. The meteg
    > can occur to the left or right of a vowel (sometimes the distinction is the
    > result of editorial intervention (see Kittel's original Biblia Hebraice
    > edition), left, right and hataf-itermediary meteg positioning are all found
    > in the ben Asher manuscripts). Unicode canonical ordering treats meteg as a
    > fixed position mark with a combining class higher than vowels, which
    > suggests that it always appears in the same position relative to vowels.
    > This is incorrect.

    This particular case might be amenable to the cloning of a Biblical
    meteg of different behavior than the existing meteg, or possibly
    something along the lines I have suggested above for the vowel
    ordering distinctions.

    If, however, you wait for a cloned meteg, then solutions await
    Unicode 4.1 (or Unicode 5.0), and any application will certainly
    be requiring the "development of a new implementation", since they
    are going to have to await the gradual rollout of generalized
    support for the new repertoire. In any case, any such approach
    requires reencoding of existing text and establishment of
    new text representational conventions. Why not seek a solution
    which can make appropriate distinctions using the existing
    repertoire, as well as the existing tools and implementations?

    --Ken



    This archive was generated by hypermail 2.1.5 : Wed Jun 25 2003 - 21:19:27 EDT