From: John Hudson (tiro@tiro.com)
Date: Sat Jun 28 2003 - 02:47:57 EDT
At 07:10 PM 6/27/2003, Kenneth Whistler wrote:
>Why? The point is that:
>
> <patah, CGJ, hiriq>
>
>is one thing, and
>
> <hiriq, CGJ, patah>
>
>is another. You *want* those sequences to be distinct, right? Even
>if the text has been normalized, right? That was the whole
>problem with:
>
> <patah, hiriq>
> <hiriq, patah>
>
>which are canonically equivalent, since they both normalize to:
>
> <hiriq, patah>
>
>So the CGJ *is* significant for searching (and sorting). If you
>want one sequence, you search for <patah, CGJ, hiriq>, if you
>want the other, you search for <hiriq, CGJ, patah>. If you
>don't care, and want to find either, *then* you strip out the
>CGJ and normalize before comparison.
I think Peter's point may be that scholar searching for patah followed by
hiriq are most likely to search for <patah, hiriq>, and frankly who can
blame them? This is what they see in the printed text, and it is what,
hopefully, they would be able to input. So again we're looking at a
solution that is only as attractive as the ability to hide it from users.
I am working on some exhaustive documentation of the normalisation problems
affecting Hebrew mark ordering, which will ensure that we have a good grasp
of the extent of the problem and a clear view of all the permutations that
need to be taken into account by any solution.
John Hudson
Tiro Typeworks www.tiro.com
Vancouver, BC tiro@tiro.com
If you browse in the shelves that, in American bookstores,
are labeled New Age, you can find there even Saint Augustine,
who, as far as I know, was not a fascist. But combining Saint
Augustine and Stonehenge -- that is a symptom of Ur-Fascism.
- Umberto Eco
This archive was generated by hypermail 2.1.5 : Sat Jun 28 2003 - 03:34:53 EDT