Re: Yerushala(y)im - or Biblical Hebrew

From: Peter Kirk (
Date: Tue Jul 29 2003 - 09:11:25 EDT

  • Next message: Raymond Mercier: "Re: UTF-8 and HTML import into MS Word 2000"

    On 28/07/2003 19:05, Kenneth Whistler wrote:

    > ...
    >This is, of course, precisely the desired result -- the CGJ is
    >ignored for weighting, but its presence prevents the reordering
    >of the vowels into the undesired sequence by normalization.
    >And the resultant weighted key weights the vowels in the correct
    >Tailoring of the collation table could modify any of this, but
    >the above example is what you get just using the default table.
    >But it is important that people implementing searching and sorting
    >for Hebrew understand why and how the CGJ is "ignored" in this
    >context, in order to get correct results. For example, if you
    >strip the CGJ and *then* hand the string to the collation weighting
    >algorithm, normalization will again rearrange the points into
    >the wrong order for weighting.
    Thank you, Ken. In this particular case we might want to tailor the
    collation table so that this CGJ is effectively ignored. But I don't
    understand this aspect of Unicode well enough to know exactly what can
    be done.

    Peter Kirk

    This archive was generated by hypermail 2.1.5 : Tue Jul 29 2003 - 09:46:20 EDT