Re: NamesList.txt as data source from Doug Ewell on 2016-03-27 (Unicode Mail List Archive)

From: Doug Ewell <doug_at_ewellic.org>
Date: Sun, 27 Mar 2016 12:38:53 -0600

Asmus Freytag wrote:

> Nobody disputes that subheaders are informative. However, subheaders
> do not define a character property.

Janusz was making a point that the CLDR data sometimes treats them as
such, or at least as a kind of supplementary property.

> There are several good reasons:
>
> 1. They do not "classify" characters in a uniform way: For some ranges
> they give the purpose for which the character was encoded (as in your
> example), for others, they give the type of character (vowel,
> consonant), and in some cases they are free of information
> ("Miscellaneous addition").
>
> 2. Even where they give the purpose for which the character was
> encoded, they do not necessarily attest that the characters in that
> range are never used for other purposes.
>
> 3. The information is purely editorial, and as such, changed by the
> editors as needed, not assigned as result of a vote in the Unicode
> Technical Committee.
>
> 4. They appear to be more "formal" than they are, just because they
> are presented with semantic markup in the input file to the code chart
> layout tool; with the file being a rather structured file, only
> because it describes a tabular presentation of data. However, see
> points (1) through (3) on why this superficial appearance of formality
> is misleading.

It seems that the main concern about using NamesList.txt to obtain
information beyond what is available in other UCD sources is that people
might treat that additional information as normative and immutable, when
it is not.

It is understood that UTC members draw important distinctions between
normative and informative material, and between material that is
immutable and that which may change over time. For many purposes, these
distinctions are crucial. However, there are uses for Unicode character
data that do not depend on these distinctions. Often it is simply not a
problem if, say, CAT FACE WITH WRY SMILE acquires a new informative
cross-reference in one Unicode release, and that cross-reference
suddenly changes or disappears in the next release.

My suggestion to assuage these fears is for UTC to add additional
warnings to the file header (right below "This file is
semi-automatically derived...") or to NamesList.html, or both, basically
stating that any information in NamesList.txt beyond that which can be
found in other UCD files is informative and subject to change without
notice. Then the burden, if such it is, will be on users to heed these
warnings.

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸

Received on Sun Mar 27 2016 - 13:40:15 CDT

This archive was generated by hypermail 2.2.0 : Sun Mar 27 2016 - 13:40:16 CDT