Re: String name and Character Name

From: Peter Kirk (peterkirk@qaya.org)
Date: Sat Apr 23 2005 - 08:45:03 CST

Next message: Peter Kirk: "Re: String name and Character Name"

Previous message: Peter Kirk: "Re: String name and Character Name"
In reply to: Otto Stolz: "Re: String name and Character Name"
Next in thread: Otto Stolz: "Re: String name and Character Name"
Reply: Otto Stolz: "Re: String name and Character Name"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 22/04/2005 10:09, Otto Stolz wrote:

> Hello Peter Kirk,
>
> you have written:
>
>> I don't know why there is a need for a second "unique and immutable
>> identifier" in addition to the U+xxxx code point identifier.
>
>
> Have you ever read Section C.6 of TUS
> <http://www.unicode.org/versions/Unicode4.0.0/appC.pdf>?
>
No. Well, I had not before. (Jill, thanks for defending me, but in fact
I have not been on this list for much longer than you, and I was not a
silent lurker!) But I am aware of its contents. I note:

> In the ISO/IEC framework, the unique character name is viewed as the
> major resource for
> both character semantics and cross-mapping among standards. In the
> framework of the
> Unicode Standard, ...

but the sentence does not continue by pointing out the pitfalls of using
these unique character names. Nor does it mention that, to quote Asmus,
"the intended purpose of the nameslist was deliberately *reduced* to
providing an unique and immutable identifier".

Elsewhere you wrote:

> So much for the "obvious places"
> where another contributor to this thread ostensibly had looked to
> no avail.
>
>> I really don't understand why this thread is getting warm.
>
>
> Its just because some of the contributors to this thread apparently
> have not bothered to do this sort of basic (and simple!) research
> before conceiving (and conveying) their ideas.

If this is intended as a reference to me, please withdraw it. I made no
claims about where I had looked for information. I was well aware of the
contents of Appendix C.6, even though I had not read the specific text.
But if there is in a place in TUS where it is made clear that "the
intended purpose of the nameslist [is only] providing an unique and
immutable identifier", it is neither of the two places which you quote.

>> But given that there is such a list, its highly restricted intended
>> purpose should be made more clear.
>
>
> How could that be made clearer than in TUS, section 16.1?
>
> Quote from <http://www.unicode.org/versions/Unicode4.0.0/ch16.pdf>:
>
>> The character names in the code charts precisely match the normative
>> character names in
>> the Unicode Character Database. Character names are unique and
>> stable. By convention
>> they are in uppercase. Because character names are stable, mistaken
>> names will not be
>> revised, but may be annotated. For example:
>> 2118 ℘ SCRIPT CAPITAL P
>> = Weierstrass elliptic function
>> • actually this has the form of a lowercase calligraphic p,
>> despite its name
>
>
Otto, I am aware that you are probably not a mother tongue speaker of
English, but your written English is so good that I would expect you to
understand that nowhere in the above quotation is there even the
slightest suggestion that "the intended purpose of the nameslist [is
only] providing an unique and immutable identifier", and "it does not
explicitly include the task of supporting users in identifying
characters". Elsewhere this section does state:

> the formal character names may differ in unexpected ways from commonly
> used names

but fails to draw the obvious conclusion, and the one accepted by the
UTC that according to Asmus, that formal character names should not be
considered to have any significance except in that they are unique and
immutable.

On further consideration, I have realised that there is no need to call
for the list of character names to be formally deprecated, because the
UTC has already effectively done this by their decision as follows:

> the intended purpose of the nameslist was deliberately *reduced* to
> providing an unique and immutable identifier, subject to the rules of
> Annex L in ISO/IEC 10646 insofar as enforced by WG2.

For, as Dean pointed out, if

>"... emphasizing that these are really semi-arbitrary character
>identifiers and not names per se" sounds awfully close to "deprecation"
>as names.
>
then reducing their purpose "to providing an unique and immutable
identifier" sounds even closer to "deprecation".

I am not sure what effect the "subject to the rules of Annex L in
ISO/IEC 10646 insofar as enforced by WG2" part has in practice, but this
seems to mark the point at which this goes outside the control of the
UTC and into that of WG2. And so I am not sure whether I need to suggest
that WG2 also makes changes.

But there is a problem in that this decision of the UTC has not been put
into proper effect even within the text of the Unicode standard itself,
in which there are a huge number of cases of a Unicode character name
being given semantic significance. For an example taken almost at
random, I quote the following from section 16.1, p.415:

> When a case mapping corresponds solely to a difference based on 
> versus  in the names of the characters, the case mapping is not
> given in the names list but only in the Unicode Character Database.

In other words, case mappings depend on character names, in breach of
the principle that "the intended purpose of the nameslist [is only]
providing an unique and immutable identifier". I suspect that I could
find hundreds of breaches of this principle within the text of the
standard. A good test for the editors would be whether the text remains
comprehensible if the character name is replaced by a meaningless (but
unique) string - if not, it is clear that the character name is acting
as more than "an unique and immutable identifier".

> As said before: if you feel like suggesting a better wording
> then submit via <http://www.unicode.org/reporting.html>.
>
I will accept this suggestion because this time you are talking about
changes which can be made. But I can hardly take it up, because I doubt
if the editing box on that page will accept the almost complete rewrite
of the text of the standard which would be required to properly
implement the restricted purpose of the namelist. And, more seriously,
large scale editing of this kind, to conform to the decisions of the
UTC, should be the job of the editors of the standard.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/
-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.10.2 - Release Date: 21/04/2005

Next message: Peter Kirk: "Re: String name and Character Name"
Previous message: Peter Kirk: "Re: String name and Character Name"
In reply to: Otto Stolz: "Re: String name and Character Name"
Next in thread: Otto Stolz: "Re: String name and Character Name"
Reply: Otto Stolz: "Re: String name and Character Name"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Apr 23 2005 - 08:57:42 CST