Re: Collation contractions and reordering, was: Hebrew composition model, with cantillation marks

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Nov 04 2003 - 15:00:00 EST

  • Next message: Philippe Verdy: "Re: GSM and Unicode"

    Peter Kirk wrote:
    > On 03/11/2003 15:26, Markus Scherer wrote:
    >> I suggest you try it out -
    >> http://oss.software.ibm.com/cgi-bin/icu/lx/en_US/utf-8/?_=he&EXPLORE_CollationElements
    >>
    >> ICU implements the UCA, including discontiguous contractions.
    >>
    > Thank you, Markus. Unfortunately the results are barely usable because
    > they are in Arial Unicode MS or something (and cannot be changed) which
    > simply fails to give a meaningful display of pointed Hebrew. There is a
    > clear need for a mechanism for the user to specify a useful display font
    > with good support for the text in question.

    Sorry for that. I told the developer of the Locale Explorer about this, and he will look into it,
    although it may take a few weeks. He is thinking about an option to turn off emitting any CSS.

    In the meantime, copy-paste to another application should work, maybe you can disable CSS or
    override the page/CSS font setting.

    > But one thing is immediately clear. I sorted a set of shins with various
    > combinations of shin and sin dot and dagesh, each followed by alef and
    > separately by bet. The default collator sorted all the shin alefs before
    > all the shin bets. This is probably correct for modern Hebrew. It is not
    > the preferred ordering for biblical Hebrew.

    Possible, and I am not an expert in Hebrew at all - collation or otherwise. I simply suggested this
    demo as a way to try out how the UCA works. In this case, the normalization on/off option is
    important for the discussion.

    Note that Hebrew collation in ICU uses not just the UCA but also a tailoring. You can edit the
    tailoring in the online demo, and even supply your own entirely. You could work out a tailoring for
    biblical Hebrew and use it in the demo as well as with runtime ICU libraries.

    If you think that the current Hebrew tailoring is incorrect, then the best place to submit a bug is
    with the CLDR: http://www.openi18n.org/subgroups/lade/locale/index.htm

    Best regards,
    markus



    This archive was generated by hypermail 2.1.5 : Tue Nov 04 2003 - 15:55:51 EST