From: Philippe Verdy (email@example.com)
Date: Thu Sep 20 2007 - 17:50:54 CDT
Mis-attribution, Asmus. I've not written or discussed and quoted what you
are repeating here. You must have mixed messages written by others....
> -----Message d'origine-----
> De : Asmus Freytag [mailto:firstname.lastname@example.org]
> Envoyé : jeudi 20 septembre 2007 14:07
> À : email@example.com
> Cc : 'Jonathan Pool'; firstname.lastname@example.org
> Objet : Re: Normalization in panlingual application
> On 9/19/2007 6:04 PM, Philippe Verdy wrote:
> > Asmus Freytag [mailto:email@example.com] wrote:
> >> You realize, also, that it is not (in the general case) possible to
> >> apply normalization piece-meal. Because of that, breaking the text into
> >> runs and then normalizing can give different results (in some cases),
> >> which makes pre-processing a dicey option.
> > That's not my opinion.
> The result that for many strings s and t, NFxx(s) + NFxx(t) != NFxx(s +
> t) is not a matter of opinion. For these strings, you cannot normalize
> them separately and then concatenate, and expect the result to be the
> normalized from of the two strings. UAX#15 is rather clear about that.
> > At least the first step of the conversion (converting
> > to NFC) is very safe and preserves differences, using standard programs
> > (which are widely available, so this step represents norisk). Detecting
> > compatible characters and mapping them to annoted forms can be applied
> > this step in a very straightforward thing.
> I had written:
> > > Since none of the common libraries that implement normalization forms
> > > perform the necessary mappings to markup out of the box, anyone
> > > contemplating such a scheme would be forced to implement either a
> > > pre-processing step, or their own normalization logic. This is a
> > > downright scary suggestion, since such an approach would lose the
> > > benefit of using well-tested implementation. Normalization is tricky
> > > enough that one should try to not implement if from scratch if all
> > > possible.
> your approach confirms what I suspect. By suggesting an approach like
> this, you are advocating de-novo implementation of normalization
> transformation. By the way, NFC would be a poor starting point for your
> scheme, since all normalization forms start with an (implied) first step
> of applying *de*composition. But you can't even start with NFD, since
> the minute you decompose any compatibility characters in your following
> step, you can in principle create sequences that denormalize the
> existing NFD string around it. The work to handle these exceptions,
> amounts to a full implementation of normalization, logically speaking.
> In other words, you've lost the benefit of your library.
This archive was generated by hypermail 2.1.5 : Thu Sep 20 2007 - 21:02:15 CDT