From: Peter Kirk (firstname.lastname@example.org)
Date: Tue Jul 08 2003 - 09:18:33 EDT
On 08/07/2003 02:23, Peter Kirk wrote:
> Would it work to define a new character, for example, for patah-hiriq
> which has a canonical decomposition into patah plus hiriq, or even
> into hiriq plus patah? Would normalisation compose a patah-hiriq
> sequence into this character and so get round the reordering problem?
> Remember that the reverse sequence is actually not attested, as far as
> I can tell for any of the sequences in question.
A couple of off list comments have made it clear to me that this
proposal needs some clarification and adjustment. But I think it can
still be made to work. It is a nasty kludge, but then as someone pointed
out any solution to this problem is bound to be a nasty kludge. In some
ways it is less nasty than others that have been suggested, and it
doesn't have some of the disadvantages that have been mentioned. It also
has the advantage that no recoding of existing text is required. That
doesn't make it my preferred solution (the CGJ solution is still that),
but it is at least worth considering.
This solution requires adding a new character for each vowel sequence
found in Hebrew texts. Currently six such sequences have been identified
in the WTS Bible text - though one of these (sheva-hiriq) is already in
canonical order and so not a problem. So this is fewer new characters
than the earlier proposal - but there may be other sequences in other
texts. This relies on the fact that none of these sequences are found in
reverse, although we cannot guarantee that this is true for all texts. I
will use the patah-hiriq sequence as an example, all other sequences
solved separately in the same way.
The solution for this sequence is as follows: Define a new combining
character something like HEBREW LIGATURE PATAH HIRIQ with a canonical
decomposition of hiriq - patah (yes, that way round) and a glyph with a
hiriq to the left of a patah. How does this help? Well, it will not
affect users who type patah then hiriq, in non-canonical order, into an
application which does not immediately normalise the text, as the
renderer will still render hiriq to left of patah as typed. But when
this text is normalised into NFC, the sequence will first be reordered
as hiriq - patah, and then this combination will be composed into the
new ligature. That is correct, isn't it? So an application which renders
the NFC text will see the new character and should render it according
to its glyph. In NFD text, the hiriq - patah sequence remains, but it
is, I think, customary if not required for the renderer to combine the
glyphs into the defined ligature before rendering. So in every case the
end user sees hiriq to the left of patah, although in fact the
underlying encoding is reversed.
Have I missed anything vital here? I know that more study may be needed
of interaction with cantillation marks, some of which can appear between
the patah and the hiriq.
Of course we could simply store the reversed order without defining a
new character. But renderers would then need clear instruction somewhere
in the Unicode text that, as an exception to the normal rules for
rendering multiple diacritics, the hiriq should be positioned to the
left of the patah and similarly for the other attested sequences.
-- Peter Kirk email@example.com http://web.onetel.net.uk/~peterkirk/
This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 10:09:57 EDT