Re: Documented fields in NamedAliases.txt
Source: Mark Davis
Date: 2013-10-06

In the file, it says:

# For documentation, see NamesList.html and
# Each line has three fields, as described here:
# First field:  Code point
# Second field: Alias
# Third field:  Type
# The Type labels used are: correction, control, alternate, figment, abbreviation
# Those Type labels can be mapped to other strings for display, if desired.

But those Type values are not documented in either of the mentioned files, or in the header. I suggest something simple in the header like the following:

correction - A corrected name for UIs where the formal Unicode name is mistaken or misleading in one way or another. For a given code point, there is at most one value with Type=correction.

control - The most commonly used names for a control code. (For historical reasons, the control codes don't have formal Unicode names.) For a given code point, there may be multiple values with Type=control (such as U+008D, with "REVERSE LINE FEED" and "REVERSE INDEX". 
[Question: If one of the aliases is more commonly used, then it is listed first? That would be useful...]

alternate - An alternate name. For a given code point, there may be multiple values with Type=alternate.

figment - ??? [Ken would have to explain this.]

abbreviation - A common abbreviation for the character name or control code name.  For a given code point, there may be multiple values with Type=abbreviation.


In we recommend that Regex Expressions support both the formal names and the Name_Aliases. However, the 'figment' above looks suspicious - is it something that we should not recommend people match? Hard to tell without knowing what it means...

FYI: in U6.3, there are the following counts in NameAliases.txt:

352 abbreviation
84 control
17 correction
3 figment
1 alternate

There is only one non-control-character with more than one alias: