Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

From: Philippe Verdy (
Date: Fri Jun 27 2003 - 10:25:49 EDT

  • Next message: John Cowan: "Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)"

    On Friday, June 27, 2003 3:23 PM, Karljürgen Feuerherm <> wrote:
    > > At 04:22 -0500 2003-06-27, wrote:
    > Now, Q: I take it the combining classes are linked to the script,
    > rather than say to a dialect--e.g. one can't define BH as a separate
    > dialect from MH with its own set of rules? (I assume this is the case
    > because otherwise someone would have proposed it already.)
    > I REALLY think that option 1 should be beaten to death with a stick,
    > then beaten to death again, before settling for one of the others.
    > Hoping this didn't sound like a pointless diatribe but rather that
    > taking a step back from the details might help?

    Do you then propose to create a specific character, for use within the Hebrew script only, as a way to specify an alternate order for hebrew cantillation? In that case, it would be more appropriate to define new standard variants of these cantillation marks, and list them in the supported variants, to be used specially for Biblic Hebrew.

    The rule for their use must be however simple: the variant selector must be made legal before any cantillation mark, even if it is not strictly necessary (for example between a base Hebrew character and a Hebrew point, or between two hebrew points whose normalization combining order is not defective).

    This would allow writing a simple transcoding algorithm for the existing encoded texts (using only the ISO10646 encoding rules), and allow further optimizations of the transformed text, to remove Variant selectors when they are not strictly necessary.

    This way, we won't override the semantic of the existing ZWJ or CGJ characters that were initially created to be used only before a base character to join combining sequences in the renderer or to disallow a candidate break. The breaking algorithms are already complex enough to avoid adding special semantics to these characters.

    On the opposite, variant selectors are much cleaner, and the extra optimization for their superfluous use, can be added to UAX#15, simply because Variant selectors are only legal (and thus stable) for the predefined sequences.

    Variant selectors do not break the stability pact, because under this pact, a <VS, character> sequence is considered (for XML and other related standards) as distinct from the isolated character without the variant selector, and thus can have distinct character properties.

    This also has the adantage that there is absolutely no need to recode all the existing documents written with modern Hebrew, and the problem can be isolated to just the few already encoded historic documents.

    -- Philippe.

    This archive was generated by hypermail 2.1.5 : Fri Jun 27 2003 - 11:11:45 EDT