RE: Yerushala(y)im - or Biblical Hebrew

From: Jony Rosenne (
Date: Sat Jul 26 2003 - 02:27:48 EDT

  • Next message: Jony Rosenne: "RE: Yerushala(y)im - or Biblical Hebrew"

    I don't think that it is important that the user not be aware of the
    encoding, since it is only intended for Biblical scholars.


    > -----Original Message-----
    > From:
    > [] On Behalf Of Kenneth Whistler
    > Sent: Saturday, July 26, 2003 3:50 AM
    > To:
    > Cc:;
    > Subject: Re: Yerushala(y)im - or Biblical Hebrew
    > Peter wrote:
    > > One thought: Ken has suggested CGJ be used to prevent reordering of
    > > combining marks in fixed position classes such as the
    > Hebrew vowels,
    > > and also suggested that users should not need to be aware
    > of the need
    > > for CGJ for this purpose but that software can be
    > implemented in a way
    > > that hides that detail. I'm not sure how that will work,
    > Details TBD, of course, but the essence of it is that you
    > want the user experience of inserting patah + hiriq
    > to correspond to the backing store insertion of <patah, CGJ,
    > hiriq>, without making them explicitly have to know about or
    > type a "CGJ" key. There are various input and editing
    > strategies to accomplish this -- effectively the problem is
    > similar to other needs to tuck hidden characters away in the
    > backing store for bidirectional text.
    > The situation for searching is a little different. While the
    > editing tools may be smart about the Biblical Hebrew points,
    > a typical query widget might not, so in that instance, you
    > want a query on <patah, hiriq> to match the repository store
    > instance of <patah, CGJ, hiriq>. Well, format controls and
    > some other characters (including CGJ) are ordinarily supposed
    > to be ignored for searching -- unless you have specialized
    > tailorings for them. So the ordinary strategy would be to
    > keep the repository normalized, and then before local
    > comparison against the query string, strip out the CGJ for
    > the match. The situation is more complicated if the query
    > string doesn't use a CGJ *and* gets normalized. In that
    > situation, you lose the distinction in order, of course, but
    > the search strategy should be to strip out the CGJ locally
    > and renormalize. That could result in false positive matches,
    > of course, but at least you will find what you were looking for.
    > > but it's making me wonder if
    > > effectively we'd be looking at some amendment to the normalization
    > > algorithms to insert CGJ in certain enumerated contexts.
    > No.
    > --Ken

    This archive was generated by hypermail 2.1.5 : Sat Jul 26 2003 - 02:07:24 EDT