Re: Minimal Implementation of Unicode Collation Algorithm

From: Markus Scherer via Unicode <unicode_at_unicode.org>
Date: Mon, 4 Dec 2017 12:48:11 -0800

On Mon, Dec 4, 2017 at 5:30 AM, Richard Wordingham via Unicode <
unicode_at_unicode.org> wrote:

> May a collation algorithm that always compares all strings as equal be a
> compliant implementation of the Unicode Collation Algorithm (UTS #10)?
> If not, by which clause is it not compliant? Formally, this algorithm
> would require that all weights be zero.
>

I think so. The algorithm would be equivalent to an implementation of the
UCA with a degenerate CET that maps every character to a Completely
Ignorable Collation Element.

Would an implementation that supported no characters be compliant?
>

I guess so. I assume that would mean that the CET maps nothing, and that
the implementation does implement the implicit weighting of Han characters
and unassigned (here: unmapped) code points. It would also have to do NFD
first.

It used to be that for an implementation to be claimed as compliant, it
> also had to pass a specific conformance test. This requirement has now
> been abandoned, perhaps because the Default Unicode Collation Element
> Table (DUCET) is incompatible with the CLDR Collation Algorithm.
>

The DUCET is missing some things that are needed by the CLDR Collation
Algorithm, but that has nothing to do with UCA compliance.

The simple fact is that tailorings are common, and it has to be possible to
conform to the algorithm without forbidding tailorings.

markus
Received on Mon Dec 04 2017 - 14:48:43 CST

This archive was generated by hypermail 2.2.0 : Mon Dec 04 2017 - 14:48:43 CST