From: Michael Everson (
Date: Tue Sep 28 1999 - 08:28:30 EDT

Ar 14:12 -0500 1999-09-27, scríobh Scott Horne:
>Michael Everson wrote:
>> In fact it
>> is _easier_ to support languages (for things like matching and searching)
>> if you don't have to _also_ normalize between a precomposed and a
>> decomposed form.
>I agree. That's why it should've been all or none. If we can't have
>precomposed _ç_-overdot, we shouldn't have precomposed _ç_.

Irrelevant. The merger of pristine-and-pure Unicode with legacy-supporting
ISO 10646 gave us the situation we have today with a precomposed _ç_. We
have it; saying we shouldn't is a waste of breath. If we normalize, then we
don't have a problem with languages like French and Manx which use ç (or
c¸). For a language like Chechen, they won't have any problems at all
_unless_ they insist on precomposed forms.

>If that
>means that there's no unambiguous round-trip conversion between Unicode
>and some hypothetical encoding with both _ç_ and combining cedilla, so be it.

A rather cavilier attitude, if you ask me. Personally I prefer normalization.

>> Meaning that there were _technical_ reasons for drawing a line at the
>> normalization border. The line was not drawn for political or socioeconomic
>> reasons as you state.
>Have you forgotten the huge battle that was waged on this list
>(successfully, I'm glad to say) eight or nine years ago to get
>a few dozen diacritically marked Vietnamese letters added to
>the UCS?

Um, at that time the normalization hadn't been done. So at that time there
weren't _technical_ reasons for drawing a line at the normalization border.
The line was drawn after that time. It could have been before. But it has
been drawn and there had better be really good reasons offered if we are
not to respect it.

