Re: annotations (was: NamesList.txt as data source)

From: Marcel Schneider <>
Date: Mon, 14 Mar 2016 03:14:05 +0100 (CET)

On Sun, 13 Mar 2016 13:03:20 -0600, Doug Ewell wrote:

> My point is that of J.S. Choi and Janusz Bień: the problem with
> declaring NamesList off-limits is that it does contain information that
> is either:
> • not available in any other UCD file, or
> • available, but only in comments (like the MAS mappings), which aren't
> supposed to be parsed either.
> Ken wrote:
> > [ .. ] NamesList.txt is itself the result of a complicated merge
> > of code point, name, and decomposition mapping information from
> > UnicodeData.txt, of listings of standardized variation sequences from
> > StandardizedVariants.txt, and then a very long list of annotational
> > material, including names list subhead material, etc., maintained in
> > other sources.
> But sometimes an implementer really does need a piece of information
> that exists only in those "other sources." When that happens, sometimes
> the only choices are to resort to NamesList or to create one's own data
> file, as Ken did by parsing the comment lines from the math file. Both
> of these are equally distasteful when trying to be conformant.

If so, then extending the XML UCD with all the information that is actually missing in it while available in the Code Charts and NamesList.txt, ends up being a good idea. But it still remains that such a step would exponentially increase the amount of data, because items that were not meant to be systematically provided, must be.

Further I see that once this is completed, other requirements could need to tackle the same job on the core specs.

The point would be to know whether in Unicode implementation and i18n, those needs are frequent. E.g. the last Apostrophe thread showed that full automatization is sometimes impossible anyway.

Received on Sun Mar 13 2016 - 21:15:48 CDT

This archive was generated by hypermail 2.2.0 : Sun Mar 13 2016 - 21:15:49 CDT