Re: Combining latin small letters with diacritics

From: Ken Whistler <>
Date: Mon, 05 Mar 2012 14:46:42 -0800

On 3/5/2012 2:32 PM, Denis Jacquerye wrote:
> I guess it's less messy than other situations. I just couldn't help
> wondering why combining letters with diacritics are being encoded but
> letters with diacritics or out of the question.

Because the combining ones are *not* decomposed, and hence don't
have normalization issues. (At least as long as we don't start down the
inadvisable path of encoding decomposable ones...)

The base letters *are* decomposed, and have been so forever in the standard,
essentially. Because of that, base+diacritic and <base, combining-diacritic>
*are* normalized together. And because the decomposed form is
already present (and the normalized form will always be that), there
is nothing to gain by encoding precomposed versions of new base letters
of this sort.

Normalization was never designed to recurse through base letters used
as combining marks. And the very few instances of combining marks
which *do* have decompositions (see, e.g. U+0344 for a notorious
example), have made implementers' lives a misery. They create special
case funkiness in normalization, testing, collation, ...

Received on Mon Mar 05 2012 - 16:49:20 CST

This archive was generated by hypermail 2.2.0 : Mon Mar 05 2012 - 16:49:21 CST