Re: Yerushala(y)im - or Biblical Hebrew

From: Mark Davis (mark.davis@jtcsv.com)
Date: Wed Jul 23 2003 - 17:13:21 EDT

  • Next message: Peter Kirk: "Re: Yerushala(y)im - or Biblical Hebrew"

    Exactly. See http://www.unicode.org/faq/normalization.html#8, for
    example. (Note: the last FAQ would change if the UTC accepts the
    proposal for usage of CGJ.)

    Mark
    __________________________________
    http://www.macchiato.com
    ► “Eppur si muove” ◄

    ----- Original Message -----
    From: "Peter Kirk" <peter.r.kirk@ntlworld.com>
    To: <unicode@unicode.org>
    Sent: Wednesday, July 23, 2003 07:24
    Subject: Re: Yerushala(y)im - or Biblical Hebrew

    > On 23/07/2003 06:37, Peter_Constable@sil.org wrote:
    >
    > >Philippe Verdy wrote on 07/22/2003 09:18:35 PM:
    > >
    > >
    > >
    > >>If there's an agreement about what should have been the best
    > >>combining classes...
    > >>
    > >>
    > >
    > >Describing what would be the best combining classes can be tricky
    for RTL
    > >scripts if the canonical ordering is intended not only for purposes
    of
    > >normalization and string comparison but also as a preferred order
    for
    > >storage and editing interaction. The reason is that the combining
    classes
    > >are intentionally based on visual relative position wrt the base
    character,
    > >not logical. Arbitrarily, a LTR ordering ... < below left < below <
    below
    > >right < ... is used, meaning that combinations of marks will be
    sequenced
    > >in the opposite order to the underlying line order, and so not in
    the
    > >logical order in terms of which users will be thinking. As an
    example using
    > >Hebrew, for a combination of (say) beth with qamats and dehi,
    preferred
    > >classes according to the visual basis on which classes are defined
    would be
    > >
    > >qamats = 220
    > >dehi = 222
    > >
    > >and so you'd get an encoded sequence of < beth, qamats, dehi >. But
    for the
    > >user, the pre-positive dehi, being to the right of the qamats,
    would
    > >probably be thought of as occuring before the qamats.
    > >
    > >Now, I said above that the classes were based arbitrarily on a
    visual LTR
    > >order. A RTL ordering ... < below right < below < below left < ...
    could
    > >have been used, but then the same mismatch would exist for LTR
    scripts. So,
    > >the problem is not with the arbitrary choice of LTR visual ordering
    for the
    > >classes.
    > >
    > >
    > >
    > >- Peter
    > >
    > >
    >
    >---------------------------------------------------------------------
    ------
    > >Peter Constable
    > >
    > >Non-Roman Script Initiative, SIL International
    > >7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
    > >Tel: +1 972 708 7485
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > From Unicode 4.0 section 3.11,
    > http://www.unicode.org/book/preview/ch03.pdf: "The particular
    numeric
    > value of the combining class does not have any special significance;
    the
    > intent of providing the numeric values is /only/ to distinguish the
    > combining classes as being different, for use in equivalence
    > comparisons. ... The canonical order of character sequences does
    /not/
    > imply any kind of linguistic correctness or linguistic preference
    for
    > ordering of combining marks in sequences." There is therefore no
    reason
    > for combining classes to reflect ordering. The problem, if there is
    one,
    > is with rendering software which expects to receive an input stream
    in a
    > logical order although Unicode implies that the order is arbitrary,
    > especially when normalised forms are used for data exchange. The
    > implication of this is that rendering software should in general
    expect
    > to perform its own reordering.
    >
    > --
    > Peter Kirk
    > peter.r.kirk@ntlworld.com
    > http://web.onetel.net.uk/~peterkirk/
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Jul 23 2003 - 18:05:23 EDT