Date: Tue Jul 08 2003 - 12:53:48 EDT
Peter Kirk wrote on 07/08/2003 08:18:33 AM:
> A couple of off list comments have made it clear to me that this
> proposal needs some clarification and adjustment...
> The solution for this sequence is as follows: Define a new combining
> character something like HEBREW LIGATURE PATAH HIRIQ with a canonical
> decomposition of hiriq - patah (yes, that way round) and a glyph with a
> hiriq to the left of a patah... But when
> this text is normalised into NFC, the sequence will first be reordered
> as hiriq - patah, and then this combination will be composed into the
> new ligature. That is correct, isn't it?
Yes, but I wouldn't call it a ligature; I'd call it a precomposed or
digraph character (and the glyph, I'd call a composite).
> So an application which renders
> the NFC text will see the new character and should render it according
> to its glyph. In NFD text, the hiriq - patah sequence remains, but it
> is, I think, customary if not required for the renderer to combine the
> glyphs into the defined ligature before rendering.
I'm not aware of anything that presently requires a renderer to combine
the characters into a composite glyph, or to present the sequence of
characters < hiriq, patah > with the hiriq to the left of the patah --
remember, the description of Hebrew currently in Unicode assumes that such
sequences don't occur.
But, in order for your solution to work, this rendering would *have* to be
required. The fixed position classes would have to be understood as fixed
relative positions; i.e. given this combination of marks, they are always
positioned relative to one another in a fixed way, regardless of their
encoded order. This would assume that any other positioning will never
occur or be required -- true for cases that we know of, but it is possible
that there are cases we do not know of, and that such a user need could
exist in the future. You also haven't said anything about how to deal with
accents that occur between the two vowel marks (though you did notice the
issue), and the alternative of that same accent occuring either to the
left or to the right of the pair of vowel marks (which offhand seems a
likely potentiality with at least meteg -- I can't check that now since
I'm away from the office); and these would have to be dealt with as well.
Also, if the rendering of the sequence < hiriq, patah > is required to
have hiriq to the left of the patah, then what's the point of having the
additional digraph character? None that I can see. So, a simpler solution
would simply to specify the relative ordering of certain combinations of
vowel marks, regardless of the order in which they are encoded. But we'd
still have the other issues I mentioned in the preceding paragraph.
It is occuring to me that perhaps there is a way to address the stability
issues that are a concern for IETF while fixing the combining classes for
other purposes. I need to think about that some more, but that is seeming
to me like (if the details can be worked out) the best hope for finding a
solution without having a bunch of "Yeah, but..."s to deal with.
> Of course we could simply store the reversed order without defining a
> new character. But renderers would then need clear instruction somewhere
> in the Unicode text that, as an exception to the normal rules for
> rendering multiple diacritics, the hiriq should be positioned to the
> left of the patah and similarly for the other attested sequences.
As mentioned above, this would be necessary anyway for your solution to
Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 13:46:07 EDT