Re: Why no combining‐character form for U+00F8?

From: Jukka K. Korpela <>
Date: Thu, 16 Aug 2012 18:55:40 +0300

2012-08-16 18:31, Ian Clifton wrote:

> Having just been to Norway, and wanting to email my friends all about
> it, I came across a curiosity: neither of the combining characters
> U+0337, U+0338 seem to work in usually‐reliable Emacs, and indeed
> U+00F8 LATIN SMALL LETTER O WITH STROKE doesn’t seem to have a
> decomposed form, according to UnicodeData.txt. I’m sure this can’t be an
> oversight?

It isn’t an oversight but an intentional decision.

The letter “ø” (historically originating from a ligature of “o” and “e”)
could have been analyzed as consisting of the letter “o” and a diacritic
mark. Instead, it was coded as an “atomic” character that is not
decomposable in any way.

This may sound illogical, as another Scandinavian letter, “ö” (also
originating from a ligature of “o” and “e”, the latter in small size
above the “o”) is encoded as canonically decomposable.

Similarly, the letters “ł” and “đ” were encoded as “atomic.” In a sense,
it’s just the way it is, but I think I can see the reasoning behind
this. Although strokes across letters are comparable to diacritic marks
in a sense, and surely historically, the also differ from them in
essential ways. They cross over letters instead of just sitting above,
below, or otherwise near a base letters. perhaps more importantly, they
differ in placement, width, and angle: compare e.g. “ø”, “ł”, and “đ”
with each other. If the stroke were defined as a diacritic, its identity
would be rather vague.

Received on Thu Aug 16 2012 - 10:57:45 CDT

This archive was generated by hypermail 2.2.0 : Thu Aug 16 2012 - 10:57:46 CDT