Re: Combining umlauts (e.g. over a base letter)

From: Doug Ewell (dewell@roadrunner.com)
Date: Sat Feb 23 2008 - 16:15:42 CST

  • Next message: Satoshi Nakagawa: "Problem in Line breaking"

    Karl Pentzlin <karl dash pentzlin at acssoft dot de> wrote:

    > Which of the possible solutions is to be preferred (assuming that
    > there is clear evidence presented for a superscript ):
    >
    > 1. Encode a COMBINING LATIN SMALL LETTER U UMLAUT (which implies that
    > such a letter is not considered as precomposed, as there is no obvious
    > decomposition now - U+0367 U+0308 does not apply)
    > 2. Encode a COMBINING SMALL DIARESIS (or COMBINING SUPERSCRIPT
    > DIARESIS) with an informative note: suited for combinations with
    > combining letters, e.g. to mark them as umlaut
    > 3. Expand the semantics of ZWJ/ZWNJ in a way
    > - that U+006F U+0367 ZWJ U+0308 yields the wanted entity,
    > - that ZWNJ after such entities "switches back" to the application
    > of subsequential diacritics to the whole entity.
    > 4. something completely different.
    >
    > I prefer 2. as it handles this case without inventing any new
    > mechanism and also enables superscript / with a single new character,
    > and does not raise any questions about precomposedness of combining
    > letters.

    I prefer (1) because there don't seem to be enough of these cases
    (outside of this one work by Hotzenköcherle) to justify a productive
    mechanism, and because the whole notion of stacking combining marks in
    this "Russian doll" way adds a great deal of implementation complexity
    in exchange for a small edge-case benefit.

    --
    Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
    http://www.ewellic.org
    http://www1.ietf.org/html.charters/ltru-charter.html
    http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ
    


    This archive was generated by hypermail 2.1.5 : Sat Feb 23 2008 - 16:19:21 CST