Re: Merging combining classes, was: New contribution N2676

From: Jim Allan (jallan@smrtytrek.com)
Date: Tue Oct 28 2003 - 22:01:34 CST


I commented on what I saw as a problem in changing the positions of
diacritics in rendering from that shown in the charts from above to
below or from below to above.

John Cowan responded:

> True. But that doesn't mean that the glyph that a particular font uses
> for
> the sequence <g, COMBINING MACRON BELOW> can't have the bar above the g.
> This is a pure rendering question.

I don't believe it is a pure rendering question.

 From _The Unicode Standard 4.0_, 3.11 at
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf:

<< If combining characters have different combining classes--for
example, when one nonspacing mark is above a base character form and
another is below it--then no distinction of graphic form or semantic
will result. >>

Later:

<< _D46 Combining class:_ A numeric value given to each combining
Unicode character that determines with which other combining characters
it typographically interacts.

 From _The Unicode Standard 4.0_, 4.3 at
http://www.unicode.org/versions/Unicode4.0.0/ch04.pdf:

<< Each combining character has a normative canonical _combining class._
This class is used with the canonical ordering algorithm to determine
which combining characters interact typographically and to determine how
the canonical ordering of sequences of combining characters takes place. >>

This indicates that characters in different classes should not interact
typographically.

Cedilla belongs to class 202 meaning "Below attached" according to
http://www.unicode.org/Public/UNIDATA/UCD.html#Canonical_Combining_Class_Values.

However, from _The Unicode Standard 4.0_, 7.1:

<< A similar situation can be seen in the Latvian letter U+0123 LATIN
SMALL LETTER G WITH CEDILLA. In good Latvian typography, this character
is always shown with a rotated comma over the g, rather than a cedilla
below the g, because of the typographical design and layout issues
resulting from trying to place a cedilla below the descender loop of the
g. Poor Latvian fonts may substitute an acute accent for the rotated
comma, and handwritten or other printed forms may actually show the
cedilla below the g. >>

Later at 7.7:

<< U+0326 COMBINING COMMA BELOW is sometimes rendered as U+0326
COMBINING COMMA BELOW is sometimes rendered as U+0312 COMBINING TURNED
COMMA ABOVE on a lowercase "g" to avoid conflict with the descender. >>

So we have two cases noted where characters with combining class 202
(Below attached) can by Unicode specifications be rendered as if they
belonged to combining class 214 (Above attached).

In such cases they obviously do not interact with other combining class
202 characters but rather would interact with combining class 214
characters. Currently there are none--which is a blessing. :-)

But this still breaks the model.

Two exceptions to the model are perhaps reasonable (though they probably
should be noted somewhere in the data as specific exceptions to the
combining class model).

Or are they exceptions? Is it to be understood as John Cowan does that a
combining character with a combining class type of "Below ...." may in
fact be rendered above its base character by a font designer. Is it to
be understood that any combining character with a combining class types
of "Above ...." may in fact be rendered below its base character by a
font designer? Are such fonts be compliant with Unicode.

Besides this apparent breaking of one of the reasons for combining
classes, there is also the possibility that such a font would change the
semantics of a character.

In IPA a dieresis above indicates centralization (normally only applied
to vowels). A dieresis below indicates breathy voice (and might be
expected to be used on almost any base IPA character). I suppose a font
might be intelligent enough to distinguish vowel characters from
consonants in strict IPA usage, and accordingly change _g_ followed by
dieresis beneath by a dieresis above but leave the diaeresis beneath for
the vowel _y_. But what if the font user is following American
conventions and the _y_ is a consonant and U+0265 is used as a vowel
instead? Or what if the user intends the diaeresis below for some other
idiosyncratic purpose, perhaps indicating a kind of stress? Will the
user be happy to have it sometimes rendered above the characters to
which it applies and sometimes not?

What if the user intends a macron above _l_ to indicate a mid tone on a
syllabic _l_ but the font thinks that macron looks better beneath, so
we now have indication of a retracted syllabic _l_?

John Cowan wrote:

> So a smart IPA-specific font could render <g, U+0325> with the ring
> above.

What if the user is not following IPA standards and wants to make a
distinction between ring above and ring below for particular purposes?
Perhaps the author is doing mathematics.

We don't expect fonts to do smart quotation. Fonts shouldn't
second-guess the user.

If the author uses the font when composing then it will be seen that the
font is misbehaving, at least in respect to what the author wants.
Software that thinks it knows better than the user is often annoying.

More dangerous is when after the text is input a font is changed to one
that behaves differently and the changes in rendering are not noticed.

Suddenly a bar under a _g_ that was intended to indicate a fricative
becomes a macron above the _g_ indicating doubling. The author might be
making the distinction based on position. Or the author might be citing
other works and wish to duplicate the typography of what is being cited
which is good practice if the author isn't altogether sure about the
meaning of some of the diacritics in the citation.

To have combining characters in generally changing position depending on
the font doesn't seem to me to be desirable, especially in technical
work where the position of the diacritic is sometimes as important as
its shape.

Jim Allan



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST