Marco Cimarosti wrote:
> Theodore H. Smith wrote:
> > [...] If I didn't know what a composite was, I'd guess it was the same
> > thing as a combining sequence.
> >
> > However, the two are meant to be different, so it can't be the same.
> They are meant to have exactly the same meaning, appearance and behavior.
> The difference is only inside the computer's memory, and should be invisible
> to users.
> The purpose of the normalization algorithm above is to get rid of this
> useless difference:
> - Normalization Form D (NFD) turns any precomposed accented letter into a
> letter + accent sequence.
> - Normalization Form C (NFC) turns any letter + accent sequence into a
> precomposed accented letter, if one exists.
> BTW, they always sold me that precomposed accented letters exist in Unicode
> only because of backward compatibility with existing standards.

I don't get that argument. It is not difficult to round-trip convert between
NFD and a non-Unicode standard that uses precomposed characters. Round-trip
convertability of strings does not imply round-trip convertability of
individual characters, and I don't see why the latter would be necessary.

The only difficulty would have been if a pre-existing standard had supported
both precomposed and decomposed encodings of the same combining mark. I don't
think there are any such standards (other than Unicode as it is now), are

(Obviously, an NFD-only Unicode would not have been an extension of ISO-8859-1.
That wouldn't have been much of a loss; it would still have been an extension

> If this compatibility issue didn't exist, Unicode would be like NFD.

And would have been much simpler and better for it, IMHO.

