L2/01-004 From: Martin J. Duerst [duerst@w3.org] Sent: Wednesday, December 20, 2000 8:16 PM Subject: Re: The Impact of Normalization At 00/12/20 16:06 -0800, Asmus Freytag wrote: >At 12:59 PM 12/20/00 -0800, Paul Hoffman / IMC wrote: >>>The problem with "fixing" a bug like this now in CompositionExclusions.txt >>>is that it introduces equal and opposite (or some might claim >>>"worse and opposite") bugs in implementations. >> >>I am most certainly among that "some", and I would expect anyone who >>writes protocols that would rely on the Unicode Consortium's >>normalization rules to join me. Outside bodies rely on the Unicode >>Consortium's commitment not to change normalizations; that reliance could >>disappear very quickly if a change was made. > >The 'fixed' nature of the *compatibility* normalization forms was a lot >less well understood than for the canonical forms. The argument went like >this: W3C needs early normalization, they want to use form C, which needs >to be completely frozen by 3.0, and by the way we have all four forms in >the TR. At least, since we didn't have the IDN discussion then, there was >no equally compelling scenario against which to evaluate the normative >nature of the K forms. I definitely have to agree with Asmus here. There is much more need for stability for C than for K. That's also why I'm much more interested in U+fb1d, HEBREW LETTER YOD WITH HIRIQ, than in the other one. Does this combination appear in actual Hebrew text? Can somebody give some example words? What other languages is it used in, and with what frequency? The question I'm asking is this: If it appears in actual Hebrew texts, then it will appear there decomposed in legacy encodings. A very important original goal of NFC was to have a high probability that existing transcoders from legacy encodings would produce normalized Unicode. If it turns out that by fixing this bug, we get a much bigger part of the affected text to conform than the other way round, it may be worth to consider. The important thing is that what counts in the end is the overall stability of the system, which may not necessarily be the same as the stability of a single piece of data for a particular algorithm. Of course it's very clear that such a change shouldn't be done lightly. Implicit in some of the previous mails was the fear to do something as image-affecting as the Korean mess again. That would be very bad. But the inability of an organization to fix plain obvious bugs (assuming this is one) can also lead to criticism. Anyway, I think that NFC is important enough to require this issue to be considered carefully. I therefore propose that this be added as an Agenda Item for the upcomming UTC. I have copied the UTC chairs. If it's necessary to have an actual proposal to have this on the agenda, I herewith propose (without prejudice) to: - Add U+fb1d to the Composition Exclusion list. Regards, Martin.