Re: Yerushala(y)im - or Biblical Hebrew

From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Thu Jul 24 2003 - 09:06:44 EDT

Next message: Cathy Wissink: "RE: Code Pages!"

Previous message: Michael \(michka\) Kaplan: "Re: Vurtual Keyboard!"
In reply to: Peter_Constable@sil.org: "Re: Yerushala(y)im - or Biblical Hebrew"
Next in thread: Mark Davis: "Re: Yerushala(y)im - or Biblical Hebrew"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 24/07/2003 05:31, Peter_Constable@sil.org wrote:

>One thought: Ken has suggested CGJ be used to prevent reordering of
>combining marks in fixed position classes such as the Hebrew vowels, and
>also suggested that users should not need to be aware of the need for CGJ
>for this purpose but that software can be implemented in a way that hides
>that detail. I'm not sure how that will work, but it's making me wonder if
>effectively we'd be looking at some amendment to the normalization
>algorithms to insert CGJ in certain enumerated contexts.
>
>
>- Peter
>
>
>---------------------------------------------------------------------------
>Peter Constable
>
>Non-Roman Script Initiative, SIL International
>7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
>Tel: +1 972 708 7485
>
>
>
>
>
>
>
So you mean, for example, that patah - hiriq normalises not to hiriq -
patah but to patah - CGJ - patah? This certainly looks like an
interesting idea.

As hiriq - patah remains a valid normalised form, normalisation
stability is not compromised. I wonder if it might violate the following
extract from the stability policy because CGJ is not a valid character
in Unicode 3.1, and so introducing it into the string during
normalisation means that the string is not valid in Unicode 3.1:

If a string contains only characters from a given version* of the
Unicode Standard (e.g., Unicode 3.1.1), and it is put into a normalized
form in accordance with that version of Unicode, then it will be in
normalized form according to any past or future versions of Unicode.

But the problem is that this paragraph is self-contradictory, or else it
implies that no characters may be added to Unicode. For take any string
containing only characters from Unicode 4.0, some of which are new in
Unicode 4.0, and normalise it according to Unicode 4.0. This string will
not be normalised according to versions of Unicode before 4.0 because it
includes characters not defined in those previous versions.

-- 
Peter Kirk
peter.r.kirk@ntlworld.com
http://web.onetel.net.uk/~peterkirk/

Next message: Cathy Wissink: "RE: Code Pages!"
Previous message: Michael \(michka\) Kaplan: "Re: Vurtual Keyboard!"
In reply to: Peter_Constable@sil.org: "Re: Yerushala(y)im - or Biblical Hebrew"
Next in thread: Mark Davis: "Re: Yerushala(y)im - or Biblical Hebrew"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jul 24 2003 - 09:52:52 EDT