Re: Additional decompositions in decomps.txt from Ken Whistler on 2016-02-22 (Unicode Mail List Archive)

From: Ken Whistler <kenwhistler_at_att.net>
Date: Mon, 22 Feb 2016 10:10:35 -0800

Eli,

You're not missing anything. This is a bug in the documentation of
decomps.txt. Initially, added decompositions for the DUCET default
weights were all tagged as <sort>. This results in a distinct *tertiary*
weight in the initial collation weight values in DUCET. Later on,
there turned up cases where an added decomposition for the DUCET
input worked better *without* a distinct tertiary weight. In
particular, this applies to the large collection of combining marks
whose secondary weights are now collapsed into a smaller set of
distinct values. It also applies to the o with stroke character you
cite below. The documentation for decomps.txt just needs to be
updated to reflect that new pattern.

--Ken

On 2/21/2016 8:32 AM, Eli Zaretskii wrote:
> # 3. In some cases a new decomposition is added for a character which
> # has no decomposition mapping in UnicodeData.txt. In this third case,
> # a new decomposition tag "<sort>" is introduced, to distinguish these
> # introduced decompositions from those derived from UnicodeData.txt.
>
> However, I see in decomps.txt entries that seem to belong to neither
> of the 3 classes described above. Here are 2 notable examples:
>
> 00F8;;006F 0338 # LATIN SMALL LETTER O WITH STROKE => LATIN SMALL LETTER O + COMBINING LONG SOLIDUS OVERLAY
> 0142;;006C 0335 # LATIN SMALL LETTER L WITH STROKE => LATIN SMALL LETTER L + COMBINING SHORT STROKE OVERLAY
>
> In both these cases, UnicodeData.txt defines no decomposition
> properties, but the "<sort>" tag I expected to see is absent from
> decomps.txt. Is there something I'm missing here?
>
Received on Mon Feb 22 2016 - 12:12:05 CST

This archive was generated by hypermail 2.2.0 : Mon Feb 22 2016 - 12:12:05 CST