From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jan 21 2008 - 14:36:26 CST
Kent said:
> Naa. The stacked letters are of equal size, untypical for a
> bese letter - diacritic combination (even of the diacritic is
> a letter).
Yeah, I agree with that.
>
> > that stacking to the Reduktionsvokale (alpha, schwa, upsilon,
> > dotless-smallcap-i) would pose a problem.
>
> I would instead suggest that all (its just a handfull) of these
> stacked letter pairs should be encoded as atomic characters
> (no decomposition).
That's fine for the 8-bit hacks to represent this. It doesn't
seem like a good precedent to set for Unicode representation
of such stacking conventions. This is the kind of writing
convention for which lightweight markup along the lines
of the Manuel de Codage for hieroglyphics makes sense.
> I don't see why there should be any problem in principle to encode
> COMBINING PARENTHESISED DOUBLE RIGHT HOOK BELOW, COMBINING
> PARENTHESISED DIAERESIS BELOW, COMBINING FAT TILDE, etc.
Except that it is more goofiness for the representation
of "characters". Every time sombody invents conventions
for grouping and stacking of characters into boxed
chunks on paper (the way you get indefinite numbers of
squared katakana chunks in Japanese), character encoders
should not just automatically stick those in as units
in the Universal Character Set. Only if there are really
good implementation arguments that any other approach isn't
workable should we end up with non-decomposed encodings, IMO.
>
> Note also that while cedilla and ogonek are not only happenstance
> attached, they are formally attached (by their combining category),
> and can never come below other combining marks below.
Not exactly true. You can always attach an ogonek to a cedilla. ;-)
And while ccc=202 *is* less than ccc=230 and is described
as "Attached_Below", the combining class does not *force* the
glyph design, nor require there to be no visual space between
an ogonek and its base. And there are plenty of cases where the
only reasonable fallback for a renderer would be to simply
stick an unattached ogonek glyph underneath a base.
> (Even if you try,
> canonical reordering will move attached marks "past" unattached marks.)
This, however, is generally the case, in normalized text
at least. And in principle, you would want a renderer to
display <o, dot-below, ogonek> the same as <o, ogonek, dot-below>,
since they are canonical equivalents. Of course, for a lot
of good reasons, people generally don't try to mix ogoneks
and cedillas with other diacritics below -- and one of those
reasons is why IPA doesn't use ogoneks to indicate nasalization.
> However, the right hook here referred to CAN come below other marks
> (just like comma below). See the example on page 7 of
> http://www.sprachatlas.phil.uni-erlangen.de/materialien/
> Teuthonista_Handbuch.pdf.
I saw that, but the use of the inverted breve under l was
unexplained and seemed inconsistent with the explanation just
below that it indicated semivowels. The system seems to have
syllabification and articulation mixed up somehow.
> (So COMBINING RIGHT HOOK BELOW cannot be unified with COMBINING OGONEK...)
I wouldn't say "cannot", but see all this as another argument
to identify the German dialectological usage of the openness diacritic
as U+031C, and just use special fonts if it has to be displayed
looking "just like an iota".
--Ken
This archive was generated by hypermail 2.1.5 : Mon Jan 21 2008 - 14:38:44 CST