Re: Merging combining classes, was: New contribution N2676

From: jim (jallan@smrtytrek.com)
Date: Tue Oct 28 2003 - 13:34:58 CST


Peter Kirk wrote:

> Also, in the commonly used Hebrew *transliteration*, the same function
> (fricative pronunciation) is indicated by a macron above g and p but
> below b, d, k and t, for the same reason. It occurs only with these
> letters (sometimes also written below h). There might be an argument for
> using instead of g and p plus combining macron g and p plus combining
> line below - especially as if these were ever capitalised the line would
> probably be moved below. But there would need to be a clear rule that
> such combining marks are moved from below to above g and p.

The argument makes sense but I think the example should be reversed.

For _b_, _d_, _k_ and _t_ but not _h_ precombined forms with an
underbar already appear in Unicode in the Latin Extended Additional
section with canonical decompositions to the letter followed by U+0331
COMBINING MACRON BELOW.

I have seen transliterations which used an underbar with _p_ and _g_,
either below the descender or passing through the descender. Either
could be produced by U+0331 depending on how the font handled this
diacritic for characters with descenders.

Accordingly the Unicode analysis of such combinations fits my own
intuitive feeling that the low position is the normal position for a
diacritic bar indicating fricative pronunciation. The occasional placing
of the bar above the letter is the exception for typographical reasons.

In meaning this bar below a stop character is essentially identical in
meaning to a bar placed through a stop character. This is also seen and
a few of these combinations are covered by Unicode. What we really have
is a fricative indication bar that can be placed under a stop character
or through a stop character (sometimes diagonally) to indicate a
fricative (or sometimes affricate) pronunciation.

Unicode encodes U+1E20 and U+1E21 as combinations of lower and uppercase
_g_ with macron. The forms have canonical decomposition to _g_ or _G_
followed by U+0304. This seems to rule out being able to consider a bar
above and a bar below as variants of the same character within Unicode.

There is no composite character encoded for _p_ or _P_ with either
U+0304 or U+0331.

IPA specifications also indicate that U+0325 COMBINING RING BELOW and
some other diacritics normally placed beneath a character may instead be
displayed above a character for typographical reasons.

But Unicode specifications currently say nothing about the possibility
of moving under-diacritics to an over-character position for
typographical reasons except for combination of _g_ and cedilla. Of
course if diacritics are so moved then combining classes are even more
broken than they seem to be now. This one exception is already a problem
in theory though probably not in practice.

Perhaps we need instead special search folding between upper position
and lower position diacritics that are otherwise identical in form,
e.g. between U+0304 (COMBINING MACRON) and U+0331 (COMBINING MACRON
BELOW), between U+ 030A (COMBING RING ABOVE) and U+0325 (COMBINING RING
BELOW) and so forth for any diacritics where an upper form and a lower
form may have the same meaning.

Jim Allan



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST