Combining umlauts (e.g. over a base letter)

From: Karl Pentzlin (karl-pentzlin@acssoft.de)
Date: Sat Feb 23 2008 - 05:16:55 CST

  • Next message: Asmus Freytag: "Re: Combining umlauts (e.g. over a base letter)"

    Am Samstag, 23. Februar 2008 um 01:51 schrieb Asmus Freytag
     (Re: i with macron over an e - Do U+0365 and U+2071 lose their dot
     when accented like U+0069?):

    AF> ... The reason for that
    AF> is that in Unicode, you can't apply a diacritic to a diacritic, you can
    AF> only apply a diacritic to a sequence.
    AF> ... A macron applied to a sequence of <e , combining dotless i> should be
    AF> rendered as if it applied to the whole.

    This seems, as far as I know until now, sufficient for the e +
    combining i + macron, as it is used to denote lenght for the vowel
    denoted by e + combining i.

    But, how should combining umlauts (e.g. over an o, as the entity marked
    in red in the attached scan) be handled?

    o + combining u + trema: U+006F U+0367 U+0308 thus does not yield an
    o + subscript , but an o + subscript u + a trema above of that
    combination, clearly too wide to be recognized as an umlaut marker
    for the subscript .

    Which of the possible solutions is to be preferred (assuming that
    there is clear evidence presented for a superscript ):

    1. Encode a COMBINING LATIN SMALL LETTER U UMLAUT
       (which implies that such a letter is not considered as precomposed,
        as there is no obvious decomposition now - U+0367 U+0308 does not
        apply)
    2. Encode a COMBINING SMALL DIARESIS (or COMBINING SUPERSCRIPT
        DIARESIS) with an informative note:
         suited for combinations with combining letters, e.g. to mark
          them as umlaut
    3. Expand the semantics of ZWJ/ZWNJ in a way
       - that U+006F U+0367 ZWJ U+0308 yields the wanted entity,
       - that ZWNJ after such entities "switches back" to the application
         of subsequential diacritics to the whole entity.
    4. something completely different.

    I prefer 2. as it handles this case without inventing any new
    mechanism and also enables superscript / with a single new
    character, and does not raise any questions about precomposedness of
    combining letters.

    Any suggestions or opinions?
    - Karl Pentzlin



    modifier_letter_u-umlaut.png

    This archive was generated by hypermail 2.1.5 : Sat Feb 23 2008 - 05:21:22 CST