Re: Yerushala(y)im - or Biblical Hebrew

From: Peter_Constable@sil.org
Date: Tue Jul 08 2003 - 12:53:48 EDT

  • Next message: Kurosaka, Teruhiko: "RE: When is a character a currency sign?"

    Peter Kirk wrote on 07/08/2003 08:18:33 AM:

    > A couple of off list comments have made it clear to me that this
    > proposal needs some clarification and adjustment...

    > The solution for this sequence is as follows: Define a new combining
    > character something like HEBREW LIGATURE PATAH HIRIQ with a canonical
    > decomposition of hiriq - patah (yes, that way round) and a glyph with a
    > hiriq to the left of a patah... But when
    > this text is normalised into NFC, the sequence will first be reordered
    > as hiriq - patah, and then this combination will be composed into the
    > new ligature. That is correct, isn't it?

    Yes, but I wouldn't call it a ligature; I'd call it a precomposed or
    digraph character (and the glyph, I'd call a composite).

    > So an application which renders
    > the NFC text will see the new character and should render it according
    > to its glyph. In NFD text, the hiriq - patah sequence remains, but it
    > is, I think, customary if not required for the renderer to combine the
    > glyphs into the defined ligature before rendering.

    I'm not aware of anything that presently requires a renderer to combine
    the characters into a composite glyph, or to present the sequence of
    characters < hiriq, patah > with the hiriq to the left of the patah --
    remember, the description of Hebrew currently in Unicode assumes that such
    sequences don't occur.

    But, in order for your solution to work, this rendering would *have* to be
    required. The fixed position classes would have to be understood as fixed
    relative positions; i.e. given this combination of marks, they are always
    positioned relative to one another in a fixed way, regardless of their
    encoded order. This would assume that any other positioning will never
    occur or be required -- true for cases that we know of, but it is possible
    that there are cases we do not know of, and that such a user need could
    exist in the future. You also haven't said anything about how to deal with
    accents that occur between the two vowel marks (though you did notice the
    issue), and the alternative of that same accent occuring either to the
    left or to the right of the pair of vowel marks (which offhand seems a
    likely potentiality with at least meteg -- I can't check that now since
    I'm away from the office); and these would have to be dealt with as well.

    Also, if the rendering of the sequence < hiriq, patah > is required to
    have hiriq to the left of the patah, then what's the point of having the
    additional digraph character? None that I can see. So, a simpler solution
    would simply to specify the relative ordering of certain combinations of
    vowel marks, regardless of the order in which they are encoded. But we'd
    still have the other issues I mentioned in the preceding paragraph.

    It is occuring to me that perhaps there is a way to address the stability
    issues that are a concern for IETF while fixing the combining classes for
    other purposes. I need to think about that some more, but that is seeming
    to me like (if the details can be worked out) the best hope for finding a
    solution without having a bunch of "Yeah, but..."s to deal with.

    > Of course we could simply store the reversed order without defining a
    > new character. But renderers would then need clear instruction somewhere

    > in the Unicode text that, as an exception to the normal rules for
    > rendering multiple diacritics, the hiriq should be positioned to the
    > left of the patah and similarly for the other attested sequences.

    As mentioned above, this would be necessary anyway for your solution to
    work.

    - Peter

    ---------------------------------------------------------------------------
    Peter Constable

    Non-Roman Script Initiative, SIL International
    7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
    Tel: +1 972 708 7485



    This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 13:46:07 EDT