Re: Combining classes of Hebrew accents

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 6 Aug 2014 00:24:35 +0200

This is a known issues that exists since years(and that has been discussed
heavily at that time). Unfortunately this CANNOT be changed and will never
be changed. Canonical combiing classes are immutable and assigned for ever
because they are part of character properties that MUST honor the
stability, and notably here, any change would break the stability of all
normalizations.

In cases where this could cause problems because two Hebrew combining marks
with a non-zero combining class have a significant order, the only solution
is to separate them with a combining grapheme joiner (CGJ) to preserver
their relative order. For modern Hebrew this is not a problem, you don't
need CGJ and nothing is needed: the existing canonical equivalence has no
impact on the interpretation even if the relative order of combining marks
proposed by the normalization is not the most logical. But this has no
impact.

All of these has been discussed. There will be no change. In fact the
situation is ven more complex than what you think when you look at Biblic
Hebrew where there a some words where the multiple diacritics can occur
over the same base letter and interact graphically in a complex way and
where the encoding order is highly significant.

In fact the problematic cases are effectively the dots occuring in the
middle (dagesh), the sin/shin dots. And the historic cantillation marks
(which ideally should have been assined a 0 combining mark, but you can
still emulate that behavior by prefixing a CGJ before these marks to make
sure they won't be reordered). There are also other specific issues for
Yiddish.

Some complex issue being in the encoding for the Biblic name of
Yerushalayim and the name of God and similar words with diphtongs
represented by multiple diacritics in a significant order (because there
are in fact some implied but unwritten base letters: the CGJ can be used
where the implied letter is missing to solve the issue).

2014-08-05 23:05 GMT+02:00 Maxim Iorsh <iorsh_at_users.sourceforge.net>:

> Hello,
>
> I propose to change combining classes for certain Hebrew accents.
>
> Presently, the Hebrew accents belong to one of the following classes: 220
> (below), 222 (below right), 228 (above left), 230 (above). Accordingly, the
> canonical ordering puts "below" accents before "below right" accents, for
> example.
>
> Unfortunately, the resulting order is wrong. As Hebrew is a right-to-left
> script, the accents which are located below the letter on the right should
> go *before* accents which reside below the letter in the middle. The same
> goes for accents above letters.
>
> My proposal is to modify the combining class property as follows:
>
> 059A HEBREW ACCENT YETIV: ccc=219 "Below_Right_RTL"
> 05AD HEBREW ACCENT DEHI: ccc=219 "Below_Right_RTL"
> 05AE HEBREW ACCENT ZINOR: ccc=231 "Above_Left_RTL"
>
> Alternatively, existing class 218 "Below_Left" could be assigned to 059A,
> 05AD and possibly renamed to "Below_Char_Start" or something similar, so
> that it means "left" for LTR scripts and "right" for RTL scripts. The class
> 232 "Above_Right" could be assigned to 05AE and renamed accordingly.
>
> Thank you,
> -- Maxim.
>
> P. S. In a related note, does anybody know why Hebrew marks (05B0-05C7)
> are assigned fixed combining classes? It looks like most of them would be
> perfectly ok with 220 "Below" class, or other appropriate non-fixed classes.
>
> _______________________________________________
> Unicode mailing list
> Unicode_at_unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
>

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Tue Aug 05 2014 - 17:25:57 CDT

This archive was generated by hypermail 2.2.0 : Tue Aug 05 2014 - 17:25:58 CDT