Re: String name and Character Name

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Apr 12 2005 - 16:02:12 CST

Next message: John Hudson: "Re: String name and Character Name"

Previous message: David Starner: "Re: String name and Character Name"
Maybe in reply to: Sinnathurai Srivas: "String name and Character Name"
Next in thread: Edward H. Trager: "Re: String name and Character Name"
Reply: Edward H. Trager: "Re: String name and Character Name"
Reply: Dean Snyder: "Re: String name and Character Name"
Reply: Hans Aberg: "Abstract Character Name"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Ed Trager responded to John Hudson:

> In as much as no one wants the standard to include incorrect
> and meaningless things,

And you will find general consensus on that point.

> then
> I think it is perfectly reasonable to contemplate changing
> incorrect and meaningless names.

It is perfectly reasonable to contemplate this, which is why
reasonable people on this list *are* contemplating it.

> But perhaps we can conclude that it is not a high priority
> on the agendas of the relevant parties.

But on that point, you may be misreading the consensus among
the standards participants on this list.

It is not the case that:

   It is not a high priority of the relevant parties to
   change incorrect and meaningless names.

It is the case that:

   It *is* a high priority of the relevant parties *not*
   to change *any* character name, once published.

> For anyone to say, "it cannot be changed and won't be changed"
> without a very good explanation
> of *why* it cannot be changed just sounds like some sort of
> hubris in this mailing list, probably
> not intended, but that's what it sounds like.

Hubris, no.

Frustration and aggravation at having to explain established
policies over and over to people who apparently refuse to
listen, yes.

This policy dates from a famous ruckus a decade ago over
the name of æ and Æ.

1993-07-08:

   Denmark is issuing this defect report to ISO 10646-1:1993
   based on the naming of Danish, Faroese and Greenlandic letter
   "Æ" in upper and lower case and with acute accent. The
   character "Æ" is also used as letter in the Norwegian
   and Icelandic languages. Please find enclosed an official
   statement from the Danish Standards Association concerning
   the Danish letter "Æ". During the process of writing the
   ISO 10646-1:1993, the naming was correct - for example
   "LATIN CAPITAL LETTER AE" - in the second DIS. It was
   changed to "LATIN CAPITAL LIGATURE AE" in the final version
   of the ISO 10646-1 (1993). ...

This defect report took over two years to resolve, with
Francophones and Scandinavians at loggerheads every step of
the way, until DCOR No. 1 to 10646-1:1993 was published in
1996.

The Unicode Standard, being synchronized with 10646, was dragged
along in this process.

Unicode 1.0

  U+00E6 LATIN SMALL LETTER A E
    = ISO LATIN SMALL LETTER AE <-- the name in ISO 8859-1

Unicode 1.1

  U+00E6 LATIN SMALL LIGATURE AE <-- synchronized with 10646-1:1993
    = LATIN SMALL LETTER A E

Unicode 2.0

  U+00E6 LATIN SMALL LETTER AE <-- applied DCOR No. 1 to 10646-1:1993
    = LATIN SMALL LIGATURE AE

The fact this this entire fight, and the attendant confusion it
left in *all* of the standards documents from the 1993 - 1996
period, had not one single beneficial consequence for
implementations of æ and Æ, and that it left bitter feelings all
around, led both committees to decide that past a certain point
such defect reports would be noted but not acted upon, insofar
as they were requests for changes in names of published characters
in the standards.

The *stability* of published character names is far more important
to the network of interdependent standards that refer to
character encoding standards than is the correctness of the name.

But wait! Reasonable people will say, "It's a standard. Of course
the name should be correct. And if it isn't correct, it should
be corrected, so the standard is correct."

I trust that is a fair summary of the position that E. Trager,
P. Kirk, S. Srivas, and others have been maintaining recently
on this topic.

To which I can only reiterate, from experience, that the *stability*
of published character names is far more important than is the
correctness of the names.

People who are using the Unicode Standard need to wrap their
heads around the reality that it is a *character encoding
standard*. It is *not* the Universal Encyclopedia of Writing
Systems and Character Identity.

Unicode character names are normative for the purposes of the
character encoding standard and those other IT standards that
reference it. They are also *immutable*, by action of both
SC2 and the UTC, because change of character names is almost
as disruptive of the standards as changing code points for
characters would be.

This does *NOT* mean that the Unicode Standard is dictating to
anyone what the name of some letter in their writing system
should properly be, whether in English or in any other language.

That this is the case should be obvious from ASCII characters,
which, after all, have a long history of this kind of concern,
well predating Unicode's involvement in character encoding.
Take U+002F SOLIDUS. Not one American English speaker in
a 100,000 would call '/' a "solidus". Its name is "slash" or
for older speakers, perhaps "slanted bar", and so forth.
Use the term "solidus" and everyone will look blankly at you,
except Classics professors wondering what Roman money has to
do with it or programming geeks and character encoding mavens,
who know the term because they read ASCII code charts.

> But even if the mis-named and mis-spelled characters in
> the Unicode Standard are not changed, there really is
> nothing stopping me (or you) from displaying what I believe
> are more correct names for these characters in some
> website, software, or document that I might write.

Correct. Note that there is exactly one Unicode names policeman --
Michael Everson -- and he does not arrest people who display
alternative names for Unicode characters.

Nobody is going to object to people reading:

www.foo.com/index.html

as "dubdubdub dot foo dot com slash index dot aitch tee em el"

instead of:

"LATIN SMALL LETTER W LATIN SMALL LETTER W LATIN SMALL LETTER W
FULL STOP LATIN SMALL LETTER F LATIN SMALL LETTER O LATIN
SMALL LETTER O FULL STOP LATIN SMALL LETTER C LATIN SMALL
LETTER O LATIN SMALL LETTER M SOLIDUS LATIN SMALL LETTER I
LATIN SMALL LETTER N LATIN SMALL LETTER E LATIN SMALL LETTER X
FULL STOP LATIN SMALL LETTER H LATIN SMALL LETTER T
LATIN SMALL LETTER M LATIN SMALL LETTER L"

One of the reasons *why* the Unicode standard publishes many
aliases in the Unicode names list is because there often are
much better, more communicative names for particular characters,
*EVEN IN ENGLISH* than the normative names in the data file.

> In this case, common best practices can make up
> for imperfections in the standard itself.

Yes. As long as they are not mis-represented as corrections
*to* the standard, but instead as alternative, more useful
names for characters *in* the standard.

--Ken

Next message: John Hudson: "Re: String name and Character Name"
Previous message: David Starner: "Re: String name and Character Name"
Maybe in reply to: Sinnathurai Srivas: "String name and Character Name"
Next in thread: Edward H. Trager: "Re: String name and Character Name"
Reply: Edward H. Trager: "Re: String name and Character Name"
Reply: Dean Snyder: "Re: String name and Character Name"
Reply: Hans Aberg: "Abstract Character Name"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Apr 12 2005 - 16:02:58 CST