Re: NamesList.txt as data source

From: Janusz S. Bień <>
Date: Tue, 29 Mar 2016 06:40:02 +0200

On Mon, Mar 28 2016 at 13:59 CEST, writes:


> But subheads are not Unicode Character Properties.

As it was already said by Doug, nobody claims this.

> And repeating the caveats expressed earlier,

There was a lot of repetitions in this thread...

> the Nameslist data is designed for chart production, not as a reliable
> source of machine-readable data.

I guess you understand "machine-readable data" (and in consequence "data
mining") in a specific very narrow way.

> While it may be in some cases useful to look at, the subheads are not
> designed to be a consistent source of data.

Can we agree that Nameslist is a reliable source of machine-readable
data about the Unicode *charts*?

On Sun, Mar 27 2016 at 6:38 CEST, writes:


> 3 The information is purely editorial, and as such, changed by the
> editors as needed, not assigned as result of a vote in the Unicode
> Technical Committee.

Changes are not a problem if properly documented, but this is another

Let's now be more specific:

On Sun, Mar 27 2016 at 5:00 CEST, writes:
> Janusz Bień wrote:
>> Am I right that this information is available only in NamesList.txt?
> It probably comes from what Ken referred to as "a very long list of
> annotational material, including names list subhead material, etc.,
> maintained in other sources."
> If you don't have access to those "other sources,"

See below.

> then as far as I
> can tell, yes, it's available only in NamesList.txt.
> --
> Doug Ewell | | Thornton, CO 🇺🇸

On Sun, Mar 27 2016 at 6:38 CEST, writes:
> On 3/26/2016 2:10 AM, Janusz S. "Bień" wrote:


> I've just noticed that NamesList.txt is in a sense data mined by the
> Unicode consortium itself. I mean the "Unicode Utilities: Character
> Properties", which e.g. for LATIN SMALL LETTER P WITH FLOURISH
> ( display in
> particular
> subhead: Medievalist addition


> If you seriously wanted to present "all that is known about a
> character" you would need to excerpt all mentions of it in the core
> specification, as well as (potentially) any additional details
> presented in the version of the proposal document that was approved by
> the UTC as part of encoding the character.


The essential information for LATIN SMALL LETTER P WITH FLOURISH is that
in Medieval manuscripts it is used for "pro" or "por". This information
is available only in

Is this a static and permanent link? What is the copyright status of the
document? For example:

Can it be redistributed and replicated on other sites? Can it be quoted
literally in a Wikipedia entry?

In general, what can be done to make access to such information easier?

Best regards


Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics Department),,
Received on Mon Mar 28 2016 - 23:42:02 CDT

This archive was generated by hypermail 2.2.0 : Mon Mar 28 2016 - 23:42:03 CDT