*Why* are precomposed characters required for "backward compatibility"?

From: David Hopwood (david.hopwood@zetnet.co.uk)
Date: Tue Jul 09 2002 - 21:07:15 EDT


Marco Cimarosti wrote:
> Theodore H. Smith wrote:
> > [...] If I didn't know what a composite was, I'd guess it was the same
> > thing as a combining sequence.
> >
> > However, the two are meant to be different, so it can't be the same.
> They are meant to have exactly the same meaning, appearance and behavior.
> The difference is only inside the computer's memory, and should be invisible
> to users.
> The purpose of the normalization algorithm above is to get rid of this
> useless difference:
> - Normalization Form D (NFD) turns any precomposed accented letter into a
> letter + accent sequence.
> - Normalization Form C (NFC) turns any letter + accent sequence into a
> precomposed accented letter, if one exists.
> BTW, they always sold me that precomposed accented letters exist in Unicode
> only because of backward compatibility with existing standards.

I don't get that argument. It is not difficult to round-trip convert between
NFD and a non-Unicode standard that uses precomposed characters. Round-trip
convertability of strings does not imply round-trip convertability of
individual characters, and I don't see why the latter would be necessary.

The only difficulty would have been if a pre-existing standard had supported
both precomposed and decomposed encodings of the same combining mark. I don't
think there are any such standards (other than Unicode as it is now), are

(Obviously, an NFD-only Unicode would not have been an extension of ISO-8859-1.
That wouldn't have been much of a loss; it would still have been an extension

> If this compatibility issue didn't exist, Unicode would be like NFD.

And would have been much simpler and better for it, IMHO.

- --
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

Version: 2.6.3i
Charset: noconv


This archive was generated by hypermail 2.1.2 : Tue Jul 09 2002 - 18:38:28 EDT