Re: Normalization and the principled U-turn

From: John Cowan (cowan@locke.ccil.org)
Date: Tue Sep 28 1999 - 15:14:27 EDT


Michael Everson scripsit:

> I have noted that some provision for versioning of the normalization
> algorithms is presaged in UTR#15.

Only in the sense that new precomposed characters *may* be introduced,
in which case old apps will not know how to decompose them. But old
apps will not know what to do with these characters in any other way
either. New versions can never make normalized documents (that do
not contain any undefined codepoints) into non-normalized ones.

> It makes me shudder, thinking what will
> happen what my (fictitious) typesetting program, 2000 Unicode XPress
> Version 3, would do when presented with post-Unicode-3.0 data which has the
> æ-grave precomposed as well as æ with combining grave. Easy to tell:

If your application insists that its input be in normalized form
(which form is irrelevant to this case), then you will never see this
LATIN SMALL LETTER AE WITH GRAVE, but will always receive LATIN SMALL
LETTER AE followed by COMBINING GRAVE. This is the point of restricting
recomposition to Unicode 3.0 forms forever; post-3.0 precomposed
characters can never appear in any normalized form.

Of course, if the application insists on doing its own normalization,
you may have problems with post-3.0 data.

-- 
John Cowan                                   cowan@ccil.org
       I am a member of a civilization. --David Brin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT