Re: Why no combining‐character form for U+00F8?

From: Ken Whistler <kenw_at_sybase.com>
Date: Thu, 16 Aug 2012 10:29:42 -0700

On 8/16/2012 9:32 AM, Erkki I Kolehmainen wrote:
> Although the stroke is not a diacritic, keyboard drivers can be made to generate atomic characters with stroke by using a dead letter key for stroke together with the base character.

And in addition to this observation by Erkki, it is also the case that
collation
tables may (and often do) depart from the details of the formal
decompositions
defined in UnicodeData.txt (and displayed in the Unicode code charts).

For example, for the letter in question, ø, the default weighting for
the Unicode
Collation Algorithm treats the stroke as a secondary weight, analogous to
a diacritic. This is parallel to the treatment of the diaresis/umlaut on ö.
So for searching and sorting (unless tailored otherwise), a UCA-based
implementation *does* treat those two letters as diacritic modifications
of an "o",
rather than as completely separate letters.

Such behavior is tailorable on a per-language basis, so the defaults are
just
that -- defaults. And in any case, how this (or any other) character is
handled
for searching and sorting (or for keyboards and input) is somewhat
orthogonal to the
exact details of the character encoding decisions per se.

--Ken
Received on Thu Aug 16 2012 - 12:36:37 CDT

This archive was generated by hypermail 2.2.0 : Thu Aug 16 2012 - 12:36:43 CDT