Re: Hebrew collation, was: Merging combining classes

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Oct 28 2003 - 05:47:45 CST


From: "Peter Kirk" <peterkirk@qaya.org>
> I know there was quite a lot of discussion of collation of Hebrew in
> August, confused partly because it was spread over three lists (unicode,
> bidi and hebrew). I don't think we found a good solution then except to
> define as contractions each of several hundred possible combinations
> following a shin.
>
> I wonder if it might work (either in DUCET or in a tailored collation)
> to make the Hebrew vowel distinctions a third level sort, with the
> consonant modifiers dagesh, rafe and sin and shin dot at the second
> level, and accents at the fourth level. Contractions could then be made
> for dagesh, rafe and sin/shin dot so that the latter, which follows in
> the canonical order, will be collated as if coming first; and there are
> not many combinations, although we do have to allow for intervening
> meteg, which has fourth level significance.

Exactly, this will work, provided that consonnant modifiers are grouped
together and not separated by the normalization canonical order with
vowels or vowel modifiers (accents).

I don't know all the details of Hebrew points, so it would help if you
could list exhaustively to which category these points belong:

1) consonnant modifiers: dagesh, rahe, sin dot, shin dot
> 05BC ; [.0000.00BD.0002.05BC] # HEBREW POINT DAGESH OR MAPIQ
> 05BF ; [.0000.00C0.0002.05BF] # HEBREW POINT RAFE
> 05C1 ; [.0000.00C1.0002.05C1] # HEBREW POINT SHIN DOT
> 05C2 ; [.0000.00C2.0002.05C2] # HEBREW POINT SIN DOT

2) vowels?
3) vowel modifiers/accents?

> 05B0 ; [.0000.0000.00B2.05B0] # HEBREW POINT SHEVA
> 05B1 ; [.0000.0000.00B3.05B1] # HEBREW POINT HATAF SEGOL
> 05B2 ; [.0000.0000.00B4.05B2] # HEBREW POINT HATAF PATAH
> 05B3 ; [.0000.0000.00B5.05B3] # HEBREW POINT HATAF QAMATS
> 05B4 ; [.0000.0000.00B6.05B4] # HEBREW POINT HIRIQ
> 05B5 ; [.0000.0000.00B7.05B5] # HEBREW POINT TSERE
> 05B6 ; [.0000.0000.00B8.05B6] # HEBREW POINT SEGOL
> 05B7 ; [.0000.0000.00B9.05B7] # HEBREW POINT PATAH
> 05B8 ; [.0000.0000.00BA.05B8] # HEBREW POINT QAMATS
> 05B9 ; [.0000.0000.00BB.05B9] # HEBREW POINT HOLAM
> 05BB ; [.0000.0000.00BC.05BB] # HEBREW POINT QUBUTS

Here you assign vowels and vowel modifiers the same collation level. Isn't
it a problem?



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST