Re: Yerushala(y)im - or Biblical Hebrew (was Major Defect in Combining Classes of Tibetan Vowels)

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jun 26 2003 - 18:04:55 EDT

  • Next message: Kenneth Whistler: "Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)"

    Jony took the words right out of my mouth:

    > How about RLM?
    >
    > Jony

    This already belongs, naturally, in the context of the Hebrew
    text handling, which is going to have to handle bidi controls.

    Another possibility to consider is U+2060 WORD JOINER, the
    version of the zero width non-breaking space unfreighted with
    the BOM confusion of U+FEFF.

    WJ is also (gc=Cf, cc=0), so would block canonical reordering
    of a sequence it was inserted into. Unlike ZWJ, it should have no
    potentially conflicting semantics regarding ligation or anything
    else for display. It is *defined* only as specifying no break
    opportunity at its position:

      "...inserting a word joiner between two characters has no
      effect on their ligating and cursive joining behavior. The
      word joiner should be ignored in contexts other than word
      or line breaking."
      
    Well, as before, we already know that <lamed, patah, hiriq>
                                                        ^
    is not a word or line break opportunity, so inserting a WJ
    there should have no effect. And by definition, it should also
    have no effect on any glyph ligation (or any other aspect of
    the display). But it *would* break up the sequence that
    gets canonically reordered for normalization, thus enabling
    a textual distinction to be preserved.

    One might even want to suggest that if RichEdit or some other
    text control causes a display problem when WJ is inserted between
    two Hebrew points, that should be considered a bug in the
    implementation of the WORD JOINER for that text control.

    Of course, I'm not privy to the internals of such implementations
    and don't understand the font lookup issues in the kind of
    detail that John clearly does, but if WORD JOINER cannot
    be implemented as the standard says it should be, then we've
    got a more serious problem on our hand than just the
    Biblical Hebrew vocalization issue.

    --Ken

    > >
    > > At 04:26 AM 6/26/2003, Jony Rosenne wrote:
    > >
    > > >I don't think we need any new characters, ZERO WIDTH SPACE
    > > would do and
    > > >it requires no new semantics.
    > >
    > > ZERO WIDTH SPACE would screw up search and sort algorithms, I think,
    > > because it is not a control character per se and may not be
    > > ignored as desired.
    > >
    > > I've made some tests using Ken's ZWJ suggestion and, as
    > > feared, it messes
    > > with the glyph positioning lookups. The results varied
    > > slightly between MS
    > > RichText clients and InDesign ME, but both displayed marks
    > > incorrectly when
    > > ZWJ was inserted. I strongly suspect that this is not
    > > something that can
    > > easily be resolved in the glyph shaping model.
    > >
    > > John Hudson



    This archive was generated by hypermail 2.1.5 : Thu Jun 26 2003 - 18:40:49 EDT