*Why* are precomposed characters required for "backward compatibility"?

From: David Hopwood (david.hopwood@zetnet.co.uk)
Date: Tue Jul 09 2002 - 21:07:15 EDT


-----BEGIN PGP SIGNED MESSAGE-----

Marco Cimarosti wrote:
> Theodore H. Smith wrote:
> > [...] If I didn't know what a composite was, I'd guess it was the same
> > thing as a combining sequence.
> >
> > However, the two are meant to be different, so it can't be the same.
>
> They are meant to have exactly the same meaning, appearance and behavior.
> The difference is only inside the computer's memory, and should be invisible
> to users.
>
> The purpose of the normalization algorithm above is to get rid of this
> useless difference:
>
> - Normalization Form D (NFD) turns any precomposed accented letter into a
> letter + accent sequence.
>
> - Normalization Form C (NFC) turns any letter + accent sequence into a
> precomposed accented letter, if one exists.
>
> BTW, they always sold me that precomposed accented letters exist in Unicode
> only because of backward compatibility with existing standards.

I don't get that argument. It is not difficult to round-trip convert between
NFD and a non-Unicode standard that uses precomposed characters. Round-trip
convertability of strings does not imply round-trip convertability of
individual characters, and I don't see why the latter would be necessary.

The only difficulty would have been if a pre-existing standard had supported
both precomposed and decomposed encodings of the same combining mark. I don't
think there are any such standards (other than Unicode as it is now), are
there?

(Obviously, an NFD-only Unicode would not have been an extension of ISO-8859-1.
That wouldn't have been much of a loss; it would still have been an extension
of US-ASCII.)

> If this compatibility issue didn't exist, Unicode would be like NFD.

And would have been much simpler and better for it, IMHO.

- --
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBPSuImDkCAxeYt5gVAQGIjwf/dKRYcEVD5ZC5A12jmZtXrgUaS+FMcHmy
3EYhqN1Csr8aNP1JJZyz48VCd3WM9aV+vu3fieU/ADGu084pTQ97sG0ABXZeWagX
WVWpGNZH8N6JQ7YHYoW1MBkx8S1t2Fg7J36ZN71KqeKsqrWUoLFosb3QGOJpSV09
1MygGi5UPn6vW8OVX1lAmUcs+ETYwVNd9aPqxmwkpwyO48PwgjdGEuIYcvXSDAac
+g4CGPmc+mSIxrtw3yjkXIHkL8pzx1QE88BV2BB6VLiSaLvadm82Be4kGuQqcC4s
Tpr1uhGvHG+hqKLxyyXzefEZyvYi182hcFXbS+7vqhEtWnPDRayAFQ==
=yELZ
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2.1.2 : Tue Jul 09 2002 - 18:38:28 EDT