Re: Yerushala(y)im - or Biblical Hebrew

From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Wed Jul 23 2003 - 10:24:12 EDT

  • Next message: Peter Kirk: "Re: Yerushala(y)im - or Biblical Hebrew"

    On 23/07/2003 06:37, Peter_Constable@sil.org wrote:

    >Philippe Verdy wrote on 07/22/2003 09:18:35 PM:
    >
    >
    >
    >>If there's an agreement about what should have been the best
    >>combining classes...
    >>
    >>
    >
    >Describing what would be the best combining classes can be tricky for RTL
    >scripts if the canonical ordering is intended not only for purposes of
    >normalization and string comparison but also as a preferred order for
    >storage and editing interaction. The reason is that the combining classes
    >are intentionally based on visual relative position wrt the base character,
    >not logical. Arbitrarily, a LTR ordering ... < below left < below < below
    >right < ... is used, meaning that combinations of marks will be sequenced
    >in the opposite order to the underlying line order, and so not in the
    >logical order in terms of which users will be thinking. As an example using
    >Hebrew, for a combination of (say) beth with qamats and dehi, preferred
    >classes according to the visual basis on which classes are defined would be
    >
    >qamats = 220
    >dehi = 222
    >
    >and so you'd get an encoded sequence of < beth, qamats, dehi >. But for the
    >user, the pre-positive dehi, being to the right of the qamats, would
    >probably be thought of as occuring before the qamats.
    >
    >Now, I said above that the classes were based arbitrarily on a visual LTR
    >order. A RTL ordering ... < below right < below < below left < ... could
    >have been used, but then the same mismatch would exist for LTR scripts. So,
    >the problem is not with the arbitrary choice of LTR visual ordering for the
    >classes.
    >
    >
    >
    >- Peter
    >
    >
    >---------------------------------------------------------------------------
    >Peter Constable
    >
    >Non-Roman Script Initiative, SIL International
    >7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
    >Tel: +1 972 708 7485
    >
    >
    >
    >
    >
    >
    >
     From Unicode 4.0 section 3.11,
    http://www.unicode.org/book/preview/ch03.pdf: "The particular numeric
    value of the combining class does not have any special significance; the
    intent of providing the numeric values is /only/ to distinguish the
    combining classes as being different, for use in equivalence
    comparisons. ... The canonical order of character sequences does /not/
    imply any kind of linguistic correctness or linguistic preference for
    ordering of combining marks in sequences." There is therefore no reason
    for combining classes to reflect ordering. The problem, if there is one,
    is with rendering software which expects to receive an input stream in a
    logical order although Unicode implies that the order is arbitrary,
    especially when normalised forms are used for data exchange. The
    implication of this is that rendering software should in general expect
    to perform its own reordering.

    -- 
    Peter Kirk
    peter.r.kirk@ntlworld.com
    http://web.onetel.net.uk/~peterkirk/
    


    This archive was generated by hypermail 2.1.5 : Wed Jul 23 2003 - 11:10:11 EDT