Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Sat, 27 Aug 2011 02:01:25 +0200

2011/8/27 Ken Whistler <kenw_at_sybase.com>:
> On 8/26/2011 3:13 PM, Philippe Verdy wrote:
>>
>> Isn't there an intersection between NameAliases.txt proposed in
>> PRI202, and the informational table defined for UTR #25 at
>> http://www.unicode.org/Public/math/revision-12/MathClassEx-12.txt
>> which also lists other name aliases for other standards ?
>
> No.
>
>>
>> Couldn't there be a way to merge those lists ?
>
> No, there isn't. They have completely different statuses.
> NameAliases.txt is a normative part of the versioned UCD
> and is used as part of the definition of the normative namespace
> for Unicode character names. MathClassEx.txt is not part of
> the UCD, has no normative status for the Unicode Standard, and
> is associated with a UTR whose versioning is not synchronized
> with the Unicode Standard.
>
>>
>> It would have the advantage of suppressing those names from the
>> proposed table for UTR #25 (characters used in Mathematical
>> notations).
>
> Which would be a disadvantage, actually, because it would remove them from
> the context where they are useful.
>
>>
>> In the merged name aliases table, we could as well include :
>
> "we could as well include..." are dangerous words here. Going encyclopedic
> is *completely* at odds with the normative intention of NameAliases.txt.

Your statement then contradicts what PRI 202 says:
"the intent is to add various standard and de facto aliases for
control characters, which have no names defined for them in the
Unicode Standard, as well as various character abbreviations which are
in widespread use."

It explicitly links the Unicode standard with others, at least by
reference. If these aliases are to be ALL unique in the UCS namespace,
this means that it will permently link those standards to the UCS.

May be it will be good for other standards that are now stable (or
frozen and kept for historical reasons, this is the case of the
standard Postscript namespace, frozen now in the AGL and in the
PostScript's "standardEncoding", for use in TrueType, OpenType, and
PDF).

Yes I admit that the Postscript namespace is a bit different: it is
glyph-based rather than character-based, which also means that several
UCS characters may map by default to the same glyph name. But one of
those characters is still considered as the main one (for example the
"space" glyph name is normally mapped from U+0020, and from U+00A0,
but the first one is usually used by default when performing the
reverse mapping, if there's no other disambiguating context).

A similar case occurs with the GSM standard encoding (that does not
make, for example, distinctions between LATIN CAPITAL LETTER A,
CYRILLIC CAPITAL LETTER A, and GREEK CAPITAL LETTER ALPHA), as well as
in many legacy encodings that were also glyph-based and defined with
something else than a chart of representative glyphs (found in the
"/MAPPINGS" subdirectory, a sister to the "/UNIDATA" directory used by
the UCD).

Then why do you think, in the PRI 202 that some standards would have
their character names becoming part of the UCS namespace ? They could
remain as well informative, and we could have another informative
datafile (in the "MAPPINGS" subdirectory) to reference those standards
only informatively, without introducing them in the UCD...

For example the proposed addition of ISO 6429 names don't have to be a
normative part of the UCD, they could remain informational as well,
defined outside of it. They are not (and should not be) needed to
conformingly implement the UCS and Unicode algorithms, unless the
Unicode standard really wants to permanently bind the ISO 6429
standard, possibly against the intent of the authors of this standard.
Was there such formal request from the ISO standard maintainers, and
an agreed policy ?

-- Philippe.
Received on Fri Aug 26 2011 - 19:04:17 CDT

This archive was generated by hypermail 2.2.0 : Fri Aug 26 2011 - 19:04:18 CDT