Re: Major Defect in Combining Classes of Tibetan Vowels

From: John Hudson (
Date: Wed Jun 25 2003 - 19:47:44 EDT

  • Next message: Philippe Verdy: "Re: Major Defect in Combining Classes of Tibetan Vowels"

    At 03:29 PM 6/25/2003, Kenneth Whistler wrote:

    > > This is not simply
    > > 'non-traditional' but results in incorrect rendering and a different
    > > vocalisation of the text.
    >I don't think this is true.
    >First, the intent of the (admittedly problematical) fixed position
    >combining classes was that the position of the relevant marks,
    >including the relevant Hebrew points, was fixed with respect to
    >the consonant base letter, so that application of one would not
    >impact the rendering of application of another.

    This idea of Hebrew vowels as 'fixed' marks is problematical, because in
    Biblical Hebrew they are not fixed: they move relative to additional marks
    (other vowels or cantillation marks).

    >It may be more *difficult* for applications to do correct rendering,
    >but there was never any intention in the standard that I know
    >of that a sequence <hiriq, patah> would render differently
    >than a sequence <patah, hiriq>.

    Yes, this is what I am saying is wrong: <hiriq, patah> *should* render
    differently from <patah, hiriq>. This example is particularly important,
    because it occurs in the spelling of yerushalaim, the Masoretic
    approximation of yerushalayim. Correct rendering requires that the hiriq
    follows the patah, and not vice versa.

    >And never any intent that it
    >would represent a "different vocalisation of the text".

    Fair enough for modern Hebrew. Fair enough for phonetically accurate
    Hebrew. Not good enough for Biblical Hebrew in which vocalisation reflects
    Masoretic pronunciation applied to ancient consonant structures.

    > > The point is that hiriq before patah is *not*
    > > canonically equivalent to patah before hiriq,
    >This is true.
    > > except in the erroneous
    > > assumption of the Unicode Standard: the order of vowels makes words sound
    > > different and mean different things.
    >This is not. The Unicode Standard makes no assumptions or claims
    >about what the phonological or meaning equivalence of <hiriq, patah>
    >or <patah, hiriq> is for Biblical Hebrew.

    But it does make assumptions about the canonical equivalence of the mark
    orders <U+05B4, U+05B7> and <U+05B7, U+05B4>, unless my understanding of
    the purpose of combining classes is completely mistaken. My understanding
    is that any ordering of two marks with different combining classes is
    canonically equivalent; further, I understand that some normalisation forms
    will re-order marks to move marks with lower combining class values closer
    to the base character. If the sequence <lamed, patah, hiriq, final mem> is
    what the text says, normalisation that re-orders the sequence as <lamed,
    hiriq, patah, final mem> is erroneous.

    >The fact that traditional Biblical Hebrew spelling prefers one
    >order of representation and canonically ordered Unicode text
    >specifies the opposite order may be a problem for implementations,
    >but that problem does not extend to the claims that John is
    >making here.

    This isn't a problem for implementations. This is a problem of Unicode
    canonical ordering re-ordering marks whose order is lexically significant.
    The fact that, in some cases, the canonical ordering also cannot be
    rendered with existing implementations simply makes the problem visually

    > > In order to correctly encode and render the Biblical Hebrew text, it is
    > > necessary to either a) never use normalisation routines that re-order
    > marks
    > > (which is beyond the control of document authors), or b) re-classify the
    > > existing Hebrew marks so that all vowels are in a single class and will
    > not
    > > be re-ordered during normalisation, or c) encode new marks for Biblical
    > > Hebrew with all vowels in a single class.
    >I don't think these conclusions following from the current
    >Such changes are certainly not necessary in order to *render*
    >Biblical Hebrew text correctly, nor to accurately represent
    >the content of Biblical Hebrew text.

    They are necessary to render Biblical Hebrew text correctly using current
    font and layout engine technologies. These technologies work perfectly for
    Biblical Hebrew so long as Unicode canonical ordering is ignored. I think
    there is very little impetus to change or develop new implementations to
    take into account what strikes most of those involved with Biblical Hebrew
    text processing as an error in Unicode.

    >The current situation is not optimal for implementations, nor
    >does canonically ordered text follow traditional preferences
    >for spelling order -- that we can agree on. But I think the
    >claims of inadequacy for the representation or rendering
    >of Biblical Hebrew text are overblown.

    I've spent nine months working on Biblical Hebrew rendering for the major
    user community (the Society of Biblical Literature and their Font
    Foundation partners), and their take on this is that a) they want a
    solution that works with today's technology, and b) they will avoid Unicode
    canonical ordering like the plague and use custom normalisations instead.
    When we conducted normalisation tests, switching from Unicode normalisation
    of to a custom normalisation that does not re-order vowels or meteg*, we
    increased the number of unique consonant + mark(s) sequences encoded in the
    Old Testament text by more 340. This means that Unicode normalisation was
    creating 340 textual ambiguities by treating lexically distinct sequences
    as canonically equivalent. I don't think that kind of textual ambiguity is

    * Meteg re-ordering is in some respects even more problematic than
    multi-vowel re-ordering; certainly it is a more common problem. The meteg
    can occur to the left or right of a vowel (sometimes the distinction is the
    result of editorial intervention (see Kittel's original Biblia Hebraice
    edition), left, right and hataf-itermediary meteg positioning are all found
    in the ben Asher manuscripts). Unicode canonical ordering treats meteg as a
    fixed position mark with a combining class higher than vowels, which
    suggests that it always appears in the same position relative to vowels.
    This is incorrect.

    John Hudson

    Tiro Typeworks
    Vancouver, BC

    If you browse in the shelves that, in American bookstores,
    are labeled New Age, you can find there even Saint Augustine,
    who, as far as I know, was not a fascist. But combining Saint
    Augustine and Stonehenge -- that is a symptom of Ur-Fascism.
                                                                 - Umberto Eco

    This archive was generated by hypermail 2.1.5 : Wed Jun 25 2003 - 20:38:40 EDT