RE: SPAM: Re: Yerushala(y)im - or Biblical Hebrew

From: Jony Rosenne (rosennej@qsm.co.il)
Date: Tue Jul 08 2003 - 11:38:15 EDT

Next message: John Cowan: "Re: UTF-8 to UTF-16LE"

Previous message: Francois Yergeau: "RE: French group separators, was Re: The character for 10**24 i nJapanesenumbers (jo)"
In reply to: Peter Kirk: "Re: Yerushala(y)im - or Biblical Hebrew"
Next in thread: Peter Kirk: "Re: SPAM: Re: Yerushala(y)im - or Biblical Hebrew"
Reply: Peter Kirk: "Re: SPAM: Re: Yerushala(y)im - or Biblical Hebrew"
Reply: Karljürgen Feuerherm: "Re: SPAM: Re: Yerushala(y)im - or Biblical Hebrew"
Reply: Ted Hopp: "Re: Yerushala(y)im - or Biblical Hebrew"
Reply: John Cowan: "Re: SPAM: Re: Yerushala(y)im - or Biblical Hebrew"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Just a reminder that the statement of the problem has not been agreed to. I
don't see a vowel sequence in Yerushala(y)im.

Jony

> -----Original Message-----
> From: unicode-bounce@unicode.org
> [mailto:unicode-bounce@unicode.org] On Behalf Of Peter Kirk
> Sent: Tuesday, July 08, 2003 3:19 PM
> To: unicode@unicode.org
> Subject: SPAM: Re: Yerushala(y)im - or Biblical Hebrew
>
>
> On 08/07/2003 02:23, Peter Kirk wrote:
>
> >
> > Would it work to define a new character, for example, for
> patah-hiriq
> > which has a canonical decomposition into patah plus hiriq, or even
> > into hiriq plus patah? Would normalisation compose a patah-hiriq
> > sequence into this character and so get round the
> reordering problem?
> > Remember that the reverse sequence is actually not
> attested, as far as
> > I can tell for any of the sequences in question.
> >
> A couple of off list comments have made it clear to me that this
> proposal needs some clarification and adjustment. But I think it can
> still be made to work. It is a nasty kludge, but then as
> someone pointed
> out any solution to this problem is bound to be a nasty
> kludge. In some
> ways it is less nasty than others that have been suggested, and it
> doesn't have some of the disadvantages that have been
> mentioned. It also
> has the advantage that no recoding of existing text is required. That
> doesn't make it my preferred solution (the CGJ solution is
> still that),
> but it is at least worth considering.
>
> This solution requires adding a new character for each vowel sequence
> found in Hebrew texts. Currently six such sequences have been
> identified
> in the WTS Bible text - though one of these (sheva-hiriq) is
> already in
> canonical order and so not a problem. So this is fewer new characters
> than the earlier proposal - but there may be other sequences in other
> texts. This relies on the fact that none of these sequences
> are found in
> reverse, although we cannot guarantee that this is true for
> all texts. I
> will use the patah-hiriq sequence as an example, all other sequences
> solved separately in the same way.
>
> The solution for this sequence is as follows: Define a new combining
> character something like HEBREW LIGATURE PATAH HIRIQ with a canonical
> decomposition of hiriq - patah (yes, that way round) and a
> glyph with a
> hiriq to the left of a patah. How does this help? Well, it will not
> affect users who type patah then hiriq, in non-canonical
> order, into an
> application which does not immediately normalise the text, as the
> renderer will still render hiriq to left of patah as typed. But when
> this text is normalised into NFC, the sequence will first be
> reordered
> as hiriq - patah, and then this combination will be composed into the
> new ligature. That is correct, isn't it? So an application
> which renders
> the NFC text will see the new character and should render it
> according
> to its glyph. In NFD text, the hiriq - patah sequence remains, but it
> is, I think, customary if not required for the renderer to
> combine the
> glyphs into the defined ligature before rendering. So in
> every case the
> end user sees hiriq to the left of patah, although in fact the
> underlying encoding is reversed.
>
> Have I missed anything vital here? I know that more study may
> be needed
> of interaction with cantillation marks, some of which can
> appear between
> the patah and the hiriq.
>
> Of course we could simply store the reversed order without
> defining a
> new character. But renderers would then need clear
> instruction somewhere
> in the Unicode text that, as an exception to the normal rules for
> rendering multiple diacritics, the hiriq should be positioned to the
> left of the patah and similarly for the other attested sequences.
>
> --
> Peter Kirk
> peter.r.kirk@ntlworld.com
> http://web.onetel.net.uk/~peterkirk/
>
>
>
>

Next message: John Cowan: "Re: UTF-8 to UTF-16LE"
Previous message: Francois Yergeau: "RE: French group separators, was Re: The character for 10**24 i nJapanesenumbers (jo)"
In reply to: Peter Kirk: "Re: Yerushala(y)im - or Biblical Hebrew"
Next in thread: Peter Kirk: "Re: SPAM: Re: Yerushala(y)im - or Biblical Hebrew"
Reply: Peter Kirk: "Re: SPAM: Re: Yerushala(y)im - or Biblical Hebrew"
Reply: Karljürgen Feuerherm: "Re: SPAM: Re: Yerushala(y)im - or Biblical Hebrew"
Reply: Ted Hopp: "Re: Yerushala(y)im - or Biblical Hebrew"
Reply: John Cowan: "Re: SPAM: Re: Yerushala(y)im - or Biblical Hebrew"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 11:41:06 EDT