Re: Yerushala(y)im - or Biblical Hebrew

From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Wed Jul 23 2003 - 08:13:54 EDT

  • Next message: Alan Wood: "RE: U+23D0 VERTICAL LINE EXTENSION"

    On 23/07/2003 03:20, Paul Nelson (TYPOGRAPHY) wrote:

    >Please look at the definition of GCJ and other such characters.
    >Understand the differences between CGJ and ZWJ/ZWNJ.
    >
    >This discussion is very disturbing to me because after reading through
    >the L2 document register it is unclear what is the difference between
    >GCJ and ZWJ use.
    >
    >The fact that you desire a control character to not be treated as such
    >greatly concerns me. This really feels like people are trying to figure
    >out any way to twist existing constructs to avoid fixing the
    >normalization weights. I am alarmed from the implications of putting
    >control characters in place to somehow subvert the normalization.
    >
    >In an ideal world we would simply correct these values. However, it has
    >been strongly communicated by the UTC that this cannot be done without
    >jeoparizing stability agreements with IETF. Peter Constable has posted a
    >document in the register on this topic that suggests a duplication of
    >characters as a solution.
    >
    >Can we please have this topic put on the agenda for the next meeting of
    >the UTC?
    >
    >Regards,
    >
    >Paul
    >
    >
    >
    >
    >
    I have been doing a little research into the defined properties of CGJ.
    I note also that according to
    http://www.unicode.org/book/preview/ch03.pdf it is defined in Unicode
    4.0 as a "Default Ignorable". Well, I am not surprised that some people
    are confused because
    http://www.unicode.org/Public/4.0-Update/UCD-4.0.0.html#Default_Ignorable_Code_Point
    tells me "For more information, see UAX #29: Text Boundaries
    <http://www.unicode.org/reports/tr29/>.", but the string "ignorable" is
    not found in UAX #29. But from a Google search I found
    http://www.unicode.org/review/pr-5.html, desribed as "/text excerpted
    from the Unicode Standard/", section number 5.22 given so I suppose this
    is from the unpublished chapter 5 of Unicode 4.0. According to this,
    "Default ignorable code points are those that should be ignored by
    default in rendering (unless explicitly supported)... An implementation
    should ignore default ignorable characters in rendering whenever it does
    /not/ support the characters." So my suggestion that a renderer should
    simply ignore CGJ is far from twisting the requirements of Unicode, it
    is in fact a requirement of Unicode 4.0 though one that I am hardly
    surprised that some people have missed.

    The internal process by which a particular renderer implements ignoring
    a glyph is a matter for a particular implementation. John Hudson and I
    have suggested a mechanism for doing this with Uniscribe by treating the
    character internally as a normal character with a blank glyph and always
    ligating it with the preceding character. There may be other mechanisms
    which are cleaner. But in any case it seems to be a requirement not just
    for fixing this Hebrew problem but for conformance with Unicode as a
    whole that some such mechanism is implemented, so that CGJ is ignored by
    the renderer unless some specific behaviour is defined. In the case of
    rendering Hebrew, there seems to be no pressing need to define specific
    behaviour as the default is at least close to what is required.

    -- 
    Peter Kirk
    peter.r.kirk@ntlworld.com
    http://web.onetel.net.uk/~peterkirk/
    


    This archive was generated by hypermail 2.1.5 : Wed Jul 23 2003 - 09:04:53 EDT