Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Thu, 7 Feb 2013 08:57:38 +0000

On Thu, 7 Feb 2013 03:54:38 +0100
Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:

> 2013/2/6 Richard Wordingham <richard.wordingham_at_ntlworld.com>:
> > On Wed, 6 Feb 2013 20:35:04 +0000
> > I <richard.wordingham_at_ntlworld.com> wrote:
> >
> >> The UCA default weighting necessarily has many 'defective'
> >> collation elements - every character forms a collating element!
> >
> > Correction: Every non-precomposed character forms a collating
> > element. <U+00E1 LATIN SMALL LETTER A WITH ACUTE> is *not* a
> > collating element in the default collation.
>
> At the primary collation level ?

The corrected statement and its example are true statements. It is
true that at the *primary level* adding <U+00E1> as a collating element
with the appropriate collating elements would make no difference to the
default collation.

> Reread what I wrote, this was a first
> condition (I volontarily ignore all *ignorable* collation elements,
> i.e. ignortable at collation level 1).

You said, on 5 February,

"A process can be FULLY conforming by preserving the canonical
equivalence and treating ALL strings that are canonically equivalent,
without having to normalize them in any recommanded form, or
performing any reordering in its backing store, or it can choose to
normalize to any other form that is convenient for that process (so it
could be NFC or NFD, or something else)"

There's no qualification there disqualifying collation at the secondary
level from being a 'process' which may or may not be conforming.

Even working only at the primary level, there are valid
primary level collations for which this statement is not true. Your
argument to invalidate my counterexample would invalidate the Burmese
(my) collation in CLDR Version 22.1. Any attempts to erect a
natural-seeming disqualifying condition are also likely to disqualify
the Tibetan elements of the default collation of the UCA.

Richard.
Received on Thu Feb 07 2013 - 03:05:39 CST

This archive was generated by hypermail 2.2.0 : Thu Feb 07 2013 - 03:05:41 CST