From: Karl Pentzlin (firstname.lastname@example.org)
Date: Sat Feb 23 2008 - 05:16:55 CST
Am Samstag, 23. Februar 2008 um 01:51 schrieb Asmus Freytag
(Re: i with macron over an e - Do U+0365 and U+2071 lose their dot
when accented like U+0069?):
AF> ... The reason for that
AF> is that in Unicode, you can't apply a diacritic to a diacritic, you can
AF> only apply a diacritic to a sequence.
AF> ... A macron applied to a sequence of <e , combining dotless i> should be
AF> rendered as if it applied to the whole.
This seems, as far as I know until now, sufficient for the e +
combining i + macron, as it is used to denote lenght for the vowel
denoted by e + combining i.
But, how should combining umlauts (e.g. ü over an o, as the entity marked
in red in the attached scan) be handled?
o + combining u + trema: U+006F U+0367 U+0308 thus does not yield an
o + subscript ü, but an o + subscript u + a trema above of that
combination, clearly too wide to be recognized as an umlaut marker
for the subscript ü.
Which of the possible solutions is to be preferred (assuming that
there is clear evidence presented for a superscript ü):
1. Encode a COMBINING LATIN SMALL LETTER U UMLAUT
(which implies that such a letter is not considered as precomposed,
as there is no obvious decomposition now - U+0367 U+0308 does not
2. Encode a COMBINING SMALL DIARESIS (or COMBINING SUPERSCRIPT
DIARESIS) with an informative note:
· suited for combinations with combining letters, e.g. to mark
them as umlaut
3. Expand the semantics of ZWJ/ZWNJ in a way
- that U+006F U+0367 ZWJ U+0308 yields the wanted entity,
- that ZWNJ after such entities "switches back" to the application
of subsequential diacritics to the whole entity.
4. something completely different.
I prefer 2. as it handles this case without inventing any new
mechanism and also enables superscript ö/ä with a single new
character, and does not raise any questions about precomposedness of
Any suggestions or opinions?
- Karl Pentzlin
This archive was generated by hypermail 2.1.5 : Sat Feb 23 2008 - 05:21:22 CST