RE: Proposal to encode three combining diacritical marks for Low German dialect writing

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jan 21 2008 - 14:36:26 CST

  • Next message: Benjamin M Scarborough: "RE: Proposal to encode three combining diacritical marks for Low German dialect writing"

    Kent said:

    > Naa. The stacked letters are of equal size, untypical for a
    > bese letter - diacritic combination (even of the diacritic is
    > a letter).

    Yeah, I agree with that.

    >
    > > that stacking to the Reduktionsvokale (alpha, schwa, upsilon,
    > > dotless-smallcap-i) would pose a problem.
    >
    > I would instead suggest that all (its just a handfull) of these
    > stacked letter pairs should be encoded as atomic characters
    > (no decomposition).

    That's fine for the 8-bit hacks to represent this. It doesn't
    seem like a good precedent to set for Unicode representation
    of such stacking conventions. This is the kind of writing
    convention for which lightweight markup along the lines
    of the Manuel de Codage for hieroglyphics makes sense.

    > I don't see why there should be any problem in principle to encode
    > COMBINING PARENTHESISED DOUBLE RIGHT HOOK BELOW, COMBINING
    > PARENTHESISED DIAERESIS BELOW, COMBINING FAT TILDE, etc.

    Except that it is more goofiness for the representation
    of "characters". Every time sombody invents conventions
    for grouping and stacking of characters into boxed
    chunks on paper (the way you get indefinite numbers of
    squared katakana chunks in Japanese), character encoders
    should not just automatically stick those in as units
    in the Universal Character Set. Only if there are really
    good implementation arguments that any other approach isn't
    workable should we end up with non-decomposed encodings, IMO.

    >
    > Note also that while cedilla and ogonek are not only happenstance
    > attached, they are formally attached (by their combining category),
    > and can never come below other combining marks below.

    Not exactly true. You can always attach an ogonek to a cedilla. ;-)
    And while ccc=202 *is* less than ccc=230 and is described
    as "Attached_Below", the combining class does not *force* the
    glyph design, nor require there to be no visual space between
    an ogonek and its base. And there are plenty of cases where the
    only reasonable fallback for a renderer would be to simply
    stick an unattached ogonek glyph underneath a base.

    > (Even if you try,
    > canonical reordering will move attached marks "past" unattached marks.)

    This, however, is generally the case, in normalized text
    at least. And in principle, you would want a renderer to
    display <o, dot-below, ogonek> the same as <o, ogonek, dot-below>,
    since they are canonical equivalents. Of course, for a lot
    of good reasons, people generally don't try to mix ogoneks
    and cedillas with other diacritics below -- and one of those
    reasons is why IPA doesn't use ogoneks to indicate nasalization.

    > However, the right hook here referred to CAN come below other marks
    > (just like comma below). See the example on page 7 of
    > http://www.sprachatlas.phil.uni-erlangen.de/materialien/
    > Teuthonista_Handbuch.pdf.

    I saw that, but the use of the inverted breve under l was
    unexplained and seemed inconsistent with the explanation just
    below that it indicated semivowels. The system seems to have
    syllabification and articulation mixed up somehow.

    > (So COMBINING RIGHT HOOK BELOW cannot be unified with COMBINING OGONEK...)

    I wouldn't say "cannot", but see all this as another argument
    to identify the German dialectological usage of the openness diacritic
    as U+031C, and just use special fonts if it has to be displayed
    looking "just like an iota".

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Jan 21 2008 - 14:38:44 CST