Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

From: Asmus Freytag <asmusf_at_ix.netcom.com>
Date: Sat, 27 Aug 2011 11:20:19 -0700

On 8/27/2011 1:31 AM, Andrew West wrote:
> On 27 August 2011 09:25, Andrew West<andrewcwest_at_gmail.com> wrote:
>> On 27 August 2011 03:52, Benjamin M Scarborough
>> <benjamin.scarborough_at_utdallas.edu> wrote:
>>> Are name aliases exempted from the normal character naming conventions? I ask because four of the entries have words that begin with numbers.
>>>
>>> 008E;SINGLE-SHIFT 2;control
>>> 008F;SINGLE-SHIFT 3;control
>>> 0091;PRIVATE USE 1;control
>>> 0092;PRIVATE USE 2;control
>>>
>> ISO 6429 (and consequently ISO/IEC 10646 Section 11) calls these characters:
>> SINGLE-SHIFT TWO
>> SINGLE-SHIFT THREE
>> PRIVATE USE ONE
>> PRIVATE USE TWO
>>
>> <http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-048.pdf>
>>
>> Changing their names to "SINGLE-SHIFT 2" or "SINGLE-SHIFT-2" etc is
>> surely contrary to the whole point of the exercise.
> Sorry, ignore that. I hadn't noticed that the digit forms were in
> addition to the forms with numbers written as words.

Actually, you brought something to my attention that I had missed on
reading the file, so I won't ignore this.

Having these ill-formatted names *in addition* to essentially the same
name, but one that follows the naming conventions strikes me as silly.
It would set a potential precedent for adding aliases for any character
name containing either a digit or a the name for that digit. The PRI
gives no rationale for the inclusion of names "valid in earlier versions".

If there's a known deviation that is currently supported (as named
character ID, such as in regular expressions) in widely distributed
software, I would support the addition on compatibility grounds (with
tweaks that follow the naming rules). But simply because a name existed
once (but was later deprecated) strikes me as going into the same
"encyclopedic" direction that Ken himself has disavowed.

I do think now that grouping the file is a bad idea, because several
people in this discussion, myself included, missed these particular near
duplicates. The natural thing is wanting to know all names/aliases for a
character. If someone needs grouping for some purposes, a spreadsheet or
other tool can easily be used to filter by status field.

I also think that the status field "iso6429" is badly named. It should
be "control", and what is named control should be "control-alternate",
or perhaps, both of these groups should become simply "control". I think
the labels chosen by the data file just set up bad precedents. If 6429,
why not a section for 9535 (or whatever the kbd standard is) etc.

A./
Received on Sat Aug 27 2011 - 13:27:00 CDT

This archive was generated by hypermail 2.2.0 : Sat Aug 27 2011 - 13:27:07 CDT