Re: *Why* are precomposed characters required for "backward compatibility"?

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jul 11 2002 - 22:35:03 EDT


Dan Oscarsson said:

> NFD should not be an extension of ASCII. There are several spacing
> accents in ASCII
> that should be decomposed just like the spacing accents in ISO 8859-1
> are decomposed.
> All or none spacing accents should be decomposed.

In addition to the usage clarifications made by John Cowan and
David Hopwood, I should point out a little history here.

As of Unicode 2.0, some compatibility decompositions were still
provided for U+005E CIRCUMFLEX ACCENT, U+005F LOW LINE, and
U+0060 GRAVE ACCENT, along the lines suggested by Dan. However,
when normalization forms were being established and standardized
in the Unicode 3.0 time frame, it became obvious that these
particular compatibility decompositions would lead to trouble.

Any Unicode normalization form that would not leave ASCII values
unchanged would have been DOA (dead on arrival), because of its
potential impact on widely used syntax characters in countless
formal syntaxes. The equating of U+005F LOW LINE with a combining
low line applied to a SPACE was particularly problematical, since
LOW LINE is so widely accepted as an element of identifiers.

Because of these complications, the 3 compatibility decompositions
were withdrawn by the UTC (unanimously, if I recall correctly),
*before* the normalization forms were finally standardized.

Consistency in treatment would be nice, but consistency in
treatment of the multiply ambiguous ASCII characters of this ilk
is impossible at this point. And it would have been very, very, very,
veeeery bad for normalization to have allowed these three, in particular,
to have decompositions.

--Ken



This archive was generated by hypermail 2.1.2 : Thu Jul 11 2002 - 20:47:37 EDT