Re: Collation contractions and reordering, was: Hebrew composition model, with cantillation marks

From: Peter Kirk (
Date: Tue Nov 04 2003 - 13:10:23 EST

  • Next message: "Re: UTF-16 inside UTF-8"

    On 03/11/2003 15:26, Markus Scherer wrote:

    > I suggest you try it out -
    > ICU implements the UCA, including discontiguous contractions.
    > markus
    Thank you, Markus. Unfortunately the results are barely usable because
    they are in Arial Unicode MS or something (and cannot be changed) which
    simply fails to give a meaningful display of pointed Hebrew. There is a
    clear need for a mechanism for the user to specify a useful display font
    with good support for the text in question.

    But one thing is immediately clear. I sorted a set of shins with various
    combinations of shin and sin dot and dagesh, each followed by alef and
    separately by bet. The default collator sorted all the shin alefs before
    all the shin bets. This is probably correct for modern Hebrew. It is not
    the preferred ordering for biblical Hebrew.

    Here are the results of the sort, which are slightly more legible when
    pasted into this message, and show the ordering shin, <shin, dagesh,
    shin dot>, <shin, dagesh, sin dot>, <shin, shin dot>, <shin, sin dot>. I
    would consider this incorrect even for modern Hebrew as sin and shin
    dots logically come before dagesh, so I would prefer <shin, shin dot>,
    <shin, dagesh, shin dot>, <shin, sin dot>, <shin, dagesh, sin dot>, or
    perhaps better still with sin dot before shin dot.

    03 שא
    09 שּׁא
    05 שּׂא
    07 שׁא
    01 שׂא
    04 שב
    10 שּׁב
    06 שּׂב
    08 שׁב
    02 שׂב

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Tue Nov 04 2003 - 13:47:35 EST