Re: Yerushala(y)im - or Biblical Hebrew

From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Wed Jul 23 2003 - 05:40:03 EDT

  • Next message: Peter Kirk: "Re: Yerushala(y)im - or Biblical Hebrew"

    On 22/07/2003 20:34, John Hudson wrote:

    > At 06:00 PM 7/22/2003, Rick McGowan wrote:
    >
    >> A solution with CGJ has been proposed, which is very general and can be
    >> applied to this and other such situations.
    >
    >
    > I get the impression that CGJ support is not very high on the list of
    > things going to be implemented any time soon by the application
    > developers that matter to us. I'm not saying this is right, only that
    > it raises practical concerns about recommending this solution. Other
    > control characters that have been around longer may not pose this
    > problem, but may still require updates to existing Hebrew engines. I'm
    > currently trying to figure out what works and what does not in the
    > existing implementations. We're already recommending ZWNJ to inhibit
    > meteg +hataf vowel ligation, but this has problems because the control
    > character breaks the mark positioning lookups. I've yet to determine
    > whether this is a fault in the font lookups, the shaping engine,
    > particular apps or text services,
    > or something fundamental to the architecture.
    >
    > John Hudson
    >
    > Tiro Typeworks www.tiro.com
    > Vancouver, BC tiro@tiro.com
    >
    >
    >
    I hope you are not suggesting that any application developers are
    prepared to implement changes to support proposals which they have put
    forward to the UTC but are not prepared to implement changes to support
    alternative fixes to the same problems which may be preferred by the UTC
    because they are acceptable to users. Well, this would be an acceptable
    position if the alternative fix is much harder to implement than the
    preferred proposal. But in this case the alternative fix, using CGJ,
    seems to be actually a very trivial matter for a rendering engine. All
    it needs to do is to delete from its input stream any CGJ character
    before it attempts any positioning - but not before doing any
    normalisation. Of course this doesn't mean that any particular rendering
    engine can currently be programmed to do this.

    In fact it seems to me that the biblical Hebrew rendering problems which
    I have heard about (on various lists and privately) could be solved
    easily by introducing a simple pre-processing pass into the rendering
    engine. (But this is not a fix to the Yerushala(y)im problem or the
    meteg ordering problem.) This pre-processing pass should sort any
    combination of base letter and following combining marks into an order
    which is efficient for the rendering engine, not necessarily the Unicode
    canonical order, for example according to the "custom combining classes"
    of
    ftp://publisher.libronix.com/drop/Tiro/SBLHebrew-Distribution/SBLHebrew-Manual.pdf.
    It should also delete characters which are not actually to be rendered
    e.g. CGJ. This pass would also satisfy the preference of Unicode
    conformance requirement C9 in
    http://www.unicode.org/book/preview/ch03.pdf: "Ideally, an
    implementation would always interpret two canonical-equivalent character
    sequences identically." As in any practical case this is a sort of no
    more than four or five combining characters according to fixed classes,
    it can be performed very quickly if programmed into the rendering engine
    at a binary level (though not necessarily if attempted in the rendering
    engine's high level language which is not designed for this), especially
    as short cuts e.g. hash tables can be used for commonly encountered
    input orderings, including the Unicode canonical ordering.

    -- 
    Peter Kirk
    peter.r.kirk@ntlworld.com
    http://web.onetel.net.uk/~peterkirk/
    


    This archive was generated by hypermail 2.1.5 : Wed Jul 23 2003 - 06:23:30 EDT