Re: Yerushala(y)im - or Biblical Hebrew

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jul 25 2003 - 21:50:05 EDT

  • Next message: Jony Rosenne: "RE: Hebrew Sof Pasuq etc (was: Unicode Public Review Issues update)"

    Peter wrote:

    > One thought: Ken has suggested CGJ be used to prevent reordering of
    > combining marks in fixed position classes such as the Hebrew vowels, and
    > also suggested that users should not need to be aware of the need for CGJ
    > for this purpose but that software can be implemented in a way that hides
    > that detail. I'm not sure how that will work,

    Details TBD, of course, but the essence of it is that you
    want the user experience of inserting patah + hiriq
    to correspond to the backing store insertion of <patah, CGJ, hiriq>,
    without making them explicitly have to know about or type a "CGJ"
    key. There are various input and editing strategies to accomplish
    this -- effectively the problem is similar to other needs to
    tuck hidden characters away in the backing store for bidirectional
    text.

    The situation for searching is a little different. While the
    editing tools may be smart about the Biblical Hebrew points,
    a typical query widget might not, so in that instance, you
    want a query on <patah, hiriq> to match the repository store
    instance of <patah, CGJ, hiriq>. Well, format controls and
    some other characters (including CGJ) are ordinarily supposed to
    be ignored for searching -- unless you have specialized tailorings
    for them. So the ordinary strategy would be to keep the
    repository normalized, and then before local comparison against
    the query string, strip out the CGJ for the match. The
    situation is more complicated if the query string doesn't
    use a CGJ *and* gets normalized. In that situation, you lose
    the distinction in order, of course, but the search strategy
    should be to strip out the CGJ locally and renormalize. That
    could result in false positive matches, of course, but at
    least you will find what you were looking for.

    > but it's making me wonder if
    > effectively we'd be looking at some amendment to the normalization
    > algorithms to insert CGJ in certain enumerated contexts.

    No.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri Jul 25 2003 - 22:28:06 EDT