Re: *Why* are precomposed characters required for "backward compatibility"?

From: Kenneth Whistler (
Date: Tue Jul 09 2002 - 21:06:35 EDT

David Hopwood wrote:

> Marco Cimarosti wrote:

> > BTW, they always sold me that precomposed accented letters exist in Unicode
> > only because of backward compatibility with existing standards.
> I don't get that argument. It is not difficult to round-trip convert between
> NFD and a non-Unicode standard that uses precomposed characters. Round-trip
> convertability of strings does not imply round-trip convertability of
> individual characters, and I don't see why the latter would be necessary.

Because while it is conceptually not difficult to roundtrip convert between
legacy accented Latin characters and Unicode NFD combining character sequences,
in practice many Unicode implementations would never have gotten off the
ground if they had had to start with combining character sequences for
all Latin letters, including, in particular, the 8859 repertoires. And
the character mapping tables are considerably more complex, in practice, if
they must map 1-n, n-1, rather than 1-1. Right now, a Latin-1 to Unicode
mapping table is trivial, but if Latin-1 had not been covered with a set
of precomposed characters, the mapping would *not* have been trivial, and that
would have been a significant barrier to early Unicode adoption. And people
would *still* be complaining -- vigorously -- about the performance hit
and maintenance complexity of interoperating with 8859 and common PC
code pages.

> The only difficulty would have been if a pre-existing standard had supported
> both precomposed and decomposed encodings of the same combining mark. I don't
> think there are any such standards (other than Unicode as it is now), are
> there?

Not to my knowledge.

> (Obviously, an NFD-only Unicode would not have been an extension of ISO-8859-1.
> That wouldn't have been much of a loss; it would still have been an extension
> of US-ASCII.)
> > If this compatibility issue didn't exist, Unicode would be like NFD.
> And would have been much simpler and better for it, IMHO.

It would have been better, in some respects, to treat Latin like the
complex script it is, and to end up with the same kind of clean,
by-the-principles encoding that Unicode has for Devanagari, essentially
free of equivalences and normalization difficulties. But it took years
for major platforms to get up to speed on complex script rendering,
including the relatively simple but elusive prospect of dynamic
application of diacritics to Latin letters (and/or mapping of
combining character sequences to preformed complex glyphs).

And despite the vigorous advocacy by some factions of early Unicoders
to have a consistent, decomposed Latin representation in Unicode, there
were some rather hard-headed decisions made early on (1989) that that approach
would cripple what was then an experimental encoding. The inclusion of large
numbers of precomposed Latin letters as encoded characters was the
price for the participation of IBM, Microsoft, and the Unix vendors,
and was also the price for the possibility of alignment of Unicode with
an ISO international standard. Without paying those prices, Unicode
would not exist today, in my opinion.


> - --
> David Hopwood <>

This archive was generated by hypermail 2.1.2 : Tue Jul 09 2002 - 19:25:00 EDT