Re: Combining umlauts (e.g. over a base letter)

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sat Feb 23 2008 - 15:36:49 CST

  • Next message: Doug Ewell: "Re: Combining umlauts (e.g. over a base letter)"

    On 2/23/2008 3:16 AM, Karl Pentzlin wrote:
    > Am Samstag, 23. Februar 2008 um 01:51 schrieb Asmus Freytag
    > (Re: i with macron over an e - Do U+0365 and U+2071 lose their dot
    > when accented like U+0069?):
    >
    > AF> ... The reason for that
    > AF> is that in Unicode, you can't apply a diacritic to a diacritic, you can
    > AF> only apply a diacritic to a sequence.
    > AF> ... A macron applied to a sequence of <e , combining dotless i> should be
    > AF> rendered as if it applied to the whole.
    >
    > This seems, as far as I know until now, sufficient for the e +
    > combining i + macron, as it is used to denote lenght for the vowel
    > denoted by e + combining i.
    >
    > But, how should combining umlauts (e.g. over an o, as the entity marked
    > in red in the attached scan) be handled?
    >
    etc. are used here atomically. I don't think you can represent them with sequences of existing codes. As the policy of the UTC has been for years to approve superscripted combining letters *strictly* based on evidence of actual use, there shouldn't be a conceptual problem to encoding etc, on a one-by-one basis as needed. Whatever modifications to the letter forms are present in the superscript combining mark would then not be considered a productive use of diacritics on diacritics, and, on the whole, one wouldn't expect that sort of generative method.

    > o + combining u + trema: U+006F U+0367 U+0308 thus does not yield an
    > o + subscript , but an o + subscript u + a trema above of that
    > combination, clearly too wide to be recognized as an umlaut marker
    > for the subscript .
    >
    > Which of the possible solutions is to be preferred (assuming that
    > there is clear evidence presented for a superscript ):
    >
    > 1. Encode a COMBINING LATIN SMALL LETTER U UMLAUT
    > (which implies that such a letter is not considered as precomposed,
    > as there is no obvious decomposition now - U+0367 U+0308 does not
    > apply)
    >
    right.

    decomposition does not apply as decomposition is not applied to
    combining characters, the exception being 0344 (which is itself
    considered "discouraged", which merely represents the effect of stacking
    two combining marks that graphically apply to the same base character,
    not each other). (the other three are singleton decompositions).
    > 2. Encode a COMBINING SMALL DIARESIS (or COMBINING SUPERSCRIPT
    > DIARESIS) with an informative note:
    > suited for combinations with combining letters, e.g. to mark
    > them as umlaut
    > 3. Expand the semantics of ZWJ/ZWNJ in a way
    > - that U+006F U+0367 ZWJ U+0308 yields the wanted entity,
    > - that ZWNJ after such entities "switches back" to the application
    > of subsequential diacritics to the whole entity.
    > 4. something completely different.
    >
    > I prefer 2. as it handles this case without inventing any new
    > mechanism and also enables superscript / with a single new
    > character, and does not raise any questions about precomposedness of
    > combining letters.
    >
    I suggest that creating such a character will be more complicated than
    solution 1 and the savings are too small.

    A./
    > Any suggestions or opinions?
    > - Karl Pentzlin
    >
    >
    > ------------------------------------------------------------------------
    >



    This archive was generated by hypermail 2.1.5 : Sat Feb 23 2008 - 15:39:52 CST