Re: Merging combining classes, was: New contribution N2676

From: Peter Kirk (peterkirk@qaya.org)
Date: Mon Oct 27 2003 - 06:48:48 CST


On 26/10/2003 19:58, John Hudson wrote:

> ...
> Functionally, inserting a CGJ here resolves the problem fine. I'm just
> not convinced that CGJ is a good general solution to the normalisation
> problem: it works, but it requires deliberate insertion in every place
> where unwanted mark re-ordering may occur. If I have some free time
> over the next while, I'll try to figure out just how many places in
> the Bible text this would be needed: I suspect it is quite a lot. Of
> course, if you insert automatically CGJ after every mark, you are are
> sure that re-ordering will not take place, but you also lose any
> benefit of normalisation.
>
> John Hudson
>
CGJ is likely to be needed:

1) whenever two vowels come together in non-canonical order:
approximately 638 times in the WTS eBHS text of the Hebrew Bible (over 5
MB of UTF-8), with little variation in other texts - all but two of
these cases are in Yerushala(y)im;

2) according to my proposal, for every occurrence of right meteg:
approximately 905 times in eBHS but with a potentially large variation
between texts;

3) possibly also for every occurrence of medial meteg: approximately 78
times in eBHS.

Philippe made a good point that the ordering of combining characters
relative to CGJ needs to be constrained, as a spelling convention
because it cannot be by normalisation. But the ordering here should be
related to the logic of the language.

In the case of Yerushalayim, the second vowel is somehow auxiliary and
relates to an omitted consonant, whereas the first vowel and the accent
(often but not always present) go with the lamed which is written. So in
this case the appropriate order is <base character, vowel1, accent, CGJ,
vowel2>. In the odd case of two vowels and two accents on one base
character in Exodus 20:4 (see
http://www.qaya.org/academic/hebrew/Issues-Hebrew-Unicode.html section
3.2), the most logical order is actually <base character, vowel1,
accent1, CGJ, vowel2, accent2>, because the second accent (geresh) goes
with the second vowel (patah).

The situation is rather different for right meteg, if CGJ is used for
this, as it is always written to the right of all other combining marks
and the other marks are in their regular positions. So the most logical
ordering would be <base character, meteg, CGJ, vowel, accent>.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST