Re: FCD and Collation

From: Richard Wordingham <>
Date: Tue, 12 Feb 2013 20:19:47 +0000

On Tue, 12 Feb 2013 01:17:45 +0000
"Whistler, Ken" <> wrote:

> One of the reasons I resisted incorporation of
> canonical enclosure in the basic UCA algorithm and in the DUCET table
> is because of its infinitesimal ROI. It complicates the table and its
> processing substantially, all in service of "fixing" edge cases of
> edge cases, which have to be dealt with in tailorings, anyway.

I presume you meant 'UCET' when you wrote 'DUCET'.

I don't see what burdensome tailorings you have in mind for an
implementation with normalization set to 'on'. To add a letter, all
that is needed in LDML is a rule such as


To tailor DUCET (Version 6.2.0d9) one just adds:

0079 030B ; [.1881.0020.0002.0079]
0059 030B ; [.1881.0020.0008.0059]

Complications only arise if normalisation is omitted.

> FCD isn't part of the Unicode Standard, or of UCA, for that matter.
> It is an implementation optimization promulgated in ICU. So tweaking
> its definition would be a matter for ICU, in my opinion.

The standard UCA parameteric tailoring is defined by the LDML, and this
calls up the definition of FCD from UTN#5.

> As regards the normalization on/off parameter, although UCA mentions
> it as a possible tailoring one could do, it goes no further. The
> details of a definition of a normalization on/off parameter belong
> now to LDML and the CLDR-TC, and to their use of it in defining
> locales. Personally, I think it should stay that way.

The point is that there should be a specification of what *shall* work
when normalisation is switched off. For example, to support text in
which no default grapheme cluster has more than one character and no
character is split between collating elements, one can dispense not
only with normalisation but also with discontiguous contraction! This
will support most text in a great many languages.

Received on Tue Feb 12 2013 - 14:21:37 CST

This archive was generated by hypermail 2.2.0 : Tue Feb 12 2013 - 14:21:37 CST