RE: SPAM: Re: Yerushala(y)im - or Biblical Hebrew

From: Jony Rosenne (rosennej@qsm.co.il)
Date: Tue Jul 08 2003 - 11:38:15 EDT

  • Next message: John Cowan: "Re: UTF-8 to UTF-16LE"

    Just a reminder that the statement of the problem has not been agreed to. I
    don't see a vowel sequence in Yerushala(y)im.

    Jony

    > -----Original Message-----
    > From: unicode-bounce@unicode.org
    > [mailto:unicode-bounce@unicode.org] On Behalf Of Peter Kirk
    > Sent: Tuesday, July 08, 2003 3:19 PM
    > To: unicode@unicode.org
    > Subject: SPAM: Re: Yerushala(y)im - or Biblical Hebrew
    >
    >
    > On 08/07/2003 02:23, Peter Kirk wrote:
    >
    > >
    > > Would it work to define a new character, for example, for
    > patah-hiriq
    > > which has a canonical decomposition into patah plus hiriq, or even
    > > into hiriq plus patah? Would normalisation compose a patah-hiriq
    > > sequence into this character and so get round the
    > reordering problem?
    > > Remember that the reverse sequence is actually not
    > attested, as far as
    > > I can tell for any of the sequences in question.
    > >
    > A couple of off list comments have made it clear to me that this
    > proposal needs some clarification and adjustment. But I think it can
    > still be made to work. It is a nasty kludge, but then as
    > someone pointed
    > out any solution to this problem is bound to be a nasty
    > kludge. In some
    > ways it is less nasty than others that have been suggested, and it
    > doesn't have some of the disadvantages that have been
    > mentioned. It also
    > has the advantage that no recoding of existing text is required. That
    > doesn't make it my preferred solution (the CGJ solution is
    > still that),
    > but it is at least worth considering.
    >
    > This solution requires adding a new character for each vowel sequence
    > found in Hebrew texts. Currently six such sequences have been
    > identified
    > in the WTS Bible text - though one of these (sheva-hiriq) is
    > already in
    > canonical order and so not a problem. So this is fewer new characters
    > than the earlier proposal - but there may be other sequences in other
    > texts. This relies on the fact that none of these sequences
    > are found in
    > reverse, although we cannot guarantee that this is true for
    > all texts. I
    > will use the patah-hiriq sequence as an example, all other sequences
    > solved separately in the same way.
    >
    > The solution for this sequence is as follows: Define a new combining
    > character something like HEBREW LIGATURE PATAH HIRIQ with a canonical
    > decomposition of hiriq - patah (yes, that way round) and a
    > glyph with a
    > hiriq to the left of a patah. How does this help? Well, it will not
    > affect users who type patah then hiriq, in non-canonical
    > order, into an
    > application which does not immediately normalise the text, as the
    > renderer will still render hiriq to left of patah as typed. But when
    > this text is normalised into NFC, the sequence will first be
    > reordered
    > as hiriq - patah, and then this combination will be composed into the
    > new ligature. That is correct, isn't it? So an application
    > which renders
    > the NFC text will see the new character and should render it
    > according
    > to its glyph. In NFD text, the hiriq - patah sequence remains, but it
    > is, I think, customary if not required for the renderer to
    > combine the
    > glyphs into the defined ligature before rendering. So in
    > every case the
    > end user sees hiriq to the left of patah, although in fact the
    > underlying encoding is reversed.
    >
    > Have I missed anything vital here? I know that more study may
    > be needed
    > of interaction with cantillation marks, some of which can
    > appear between
    > the patah and the hiriq.
    >
    > Of course we could simply store the reversed order without
    > defining a
    > new character. But renderers would then need clear
    > instruction somewhere
    > in the Unicode text that, as an exception to the normal rules for
    > rendering multiple diacritics, the hiriq should be positioned to the
    > left of the patah and similarly for the other attested sequences.
    >
    > --
    > Peter Kirk
    > peter.r.kirk@ntlworld.com
    > http://web.onetel.net.uk/~peterkirk/
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 11:41:06 EDT