Re: Merging combining classes, was: New contribution N2676

From: Peter Kirk (peterkirk@qaya.org)
Date: Wed Oct 29 2003 - 05:12:22 CST


On 28/10/2003 20:01, Jim Allan wrote:

> ...
>
> From _The Unicode Standard 4.0_, 3.11 at
> http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf:
>
> << If combining characters have different combining classes--for
> example, when one nonspacing mark is above a base character form and
> another is below it--then no distinction of graphic form or semantic
> will result. >>
>
> Later:
>
> << _D46 Combining class:_ A numeric value given to each combining
> Unicode character that determines with which other combining
> characters it typographically interacts.
>
> From _The Unicode Standard 4.0_, 4.3 at
> http://www.unicode.org/versions/Unicode4.0.0/ch04.pdf:
>
> << Each combining character has a normative canonical _combining
> class._ This class is used with the canonical ordering algorithm to
> determine which combining characters interact typographically and to
> determine how the canonical ordering of sequences of combining
> characters takes place. >>
>
> This indicates that characters in different classes should not
> interact typographically.

Rather, it defines that they do not. But since this is not true on any
reasonable intuitive definition of "interact typographically" (as we
have seen with Hebrew vowel points), this statement makes sense only as
a counterintuitive definition of "interact typographically".

>
> Cedilla belongs to class 202 meaning "Below attached" according to
> http://www.unicode.org/Public/UNIDATA/UCD.html#Canonical_Combining_Class_Values.
>
>
> However, from _The Unicode Standard 4.0_, 7.1:
>
> << A similar situation can be seen in the Latvian letter U+0123 LATIN
> SMALL LETTER G WITH CEDILLA. In good Latvian typography, this
> character is always shown with a rotated comma over the g, rather than
> a cedilla below the g, because of the typographical design and layout
> issues resulting from trying to place a cedilla below the descender
> loop of the g. Poor Latvian fonts may substitute an acute accent for
> the rotated comma, and handwritten or other printed forms may actually
> show the cedilla below the g. >>
>
> Later at 7.7:
>
> << U+0326 COMBINING COMMA BELOW is sometimes rendered as U+0326
> COMBINING COMMA BELOW is sometimes rendered as U+0312 COMBINING TURNED
> COMMA ABOVE on a lowercase "g" to avoid conflict with the descender. >>
>
> So we have two cases noted where characters with combining class 202
> (Below attached) can by Unicode specifications be rendered as if they
> belonged to combining class 214 (Above attached).
>
> In such cases they obviously do not interact with other combining
> class 202 characters but rather would interact with combining class
> 214 characters. Currently there are none--which is a blessing. :-)
>
> But this still breaks the model.

Also, on an intuitive definition of "interact typographically", this
shifted comma below would interact with any centred above accent, class
230. For that matter, even in its normal position a comma below, class
202, would interact e.g. with a macron below, class 220. True, only one
ordering is possible as the macron cannot come closer to the character
than the attached comma below, so there is no problem for the canonical
ordering model. But the position of the macron still has to be shifted
to avoid cutting across the comma below, so the "interact
typographically" rule is broken. The implication is that a font designer
cannot assume that glyph positioning adjustments are required only
between adjacent characters in canonical order, despite what the
"interact typographically" rule might suggest.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST