Re: Yerushala(y)im - or Biblical Hebrew

From: Peter Kirk (peter.r.kirk@ntlworld.com)
Date: Thu Jul 24 2003 - 09:06:44 EDT

  • Next message: Cathy Wissink: "RE: Code Pages!"

    On 24/07/2003 05:31, Peter_Constable@sil.org wrote:

    >One thought: Ken has suggested CGJ be used to prevent reordering of
    >combining marks in fixed position classes such as the Hebrew vowels, and
    >also suggested that users should not need to be aware of the need for CGJ
    >for this purpose but that software can be implemented in a way that hides
    >that detail. I'm not sure how that will work, but it's making me wonder if
    >effectively we'd be looking at some amendment to the normalization
    >algorithms to insert CGJ in certain enumerated contexts.
    >
    >
    >- Peter
    >
    >
    >---------------------------------------------------------------------------
    >Peter Constable
    >
    >Non-Roman Script Initiative, SIL International
    >7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
    >Tel: +1 972 708 7485
    >
    >
    >
    >
    >
    >
    >
    So you mean, for example, that patah - hiriq normalises not to hiriq -
    patah but to patah - CGJ - patah? This certainly looks like an
    interesting idea.

    As hiriq - patah remains a valid normalised form, normalisation
    stability is not compromised. I wonder if it might violate the following
    extract from the stability policy because CGJ is not a valid character
    in Unicode 3.1, and so introducing it into the string during
    normalisation means that the string is not valid in Unicode 3.1:

    If a string contains only characters from a given version* of the
    Unicode Standard (e.g., Unicode 3.1.1), and it is put into a normalized
    form in accordance with that version of Unicode, then it will be in
    normalized form according to any past or future versions of Unicode.

    But the problem is that this paragraph is self-contradictory, or else it
    implies that no characters may be added to Unicode. For take any string
    containing only characters from Unicode 4.0, some of which are new in
    Unicode 4.0, and normalise it according to Unicode 4.0. This string will
    not be normalised according to versions of Unicode before 4.0 because it
    includes characters not defined in those previous versions.

    -- 
    Peter Kirk
    peter.r.kirk@ntlworld.com
    http://web.onetel.net.uk/~peterkirk/
    


    This archive was generated by hypermail 2.1.5 : Thu Jul 24 2003 - 09:52:52 EDT