Re: (long) Re: Chromatic font research

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Jul 02 2002 - 18:27:06 EDT


[*groans in the audience*]

I know, I know -- another contribution in the endless thread...

In re:
 
> The Respectfully Experiment

> I used it as evidence that ideas about what should not be
> included in Unicode can change over a period of time as new scientific
> evidence is discovered.

Having been intimately involved in nearly all the decisions made
about what was included in Unicode over the last 13 years, and also
being formally trained as a scientist, I think I may be qualified
to dispute this conclusion.

Most of the change in ideas about what can be included in Unicode
have been the result of two types of influence:

  A. The encountering of legacy practice in preexisting character
     encodings which had to be accomodated for interoperability
     reasons. This accounts for many, if not all of the hinky little
     edge cases where Unicode appears to depart from its general
     principles for how to encode characters.

  B. The development of new processing requirements that required
     special kinds of encoded characters. This accounted for strange
     animals such as the bidi format controls, the BOM, the object
     replacement character, and the like.

There is a very narrow window of opportunity for *scientific*
evidence contributing to this -- namely, the result of graphological
analysis of previously poorly studied ancient or minority scripts,
which conceivably could turn up some obscure new principle of writing
systems that would require Unicode to consider adding a new type of
character to accomodate it. But at this point, with Unicode having managed
to encode everything from Arabic to Mongolian to Han to Khmer..., I
consider it rather unlikely that scientific graphological study is going
to turn up many new fundamental principles here. As a scientific
*hypothesis* I think this surmise is proving to hold up rather well,
as our premier encoder of historic and minority scripts, Michael
Everson, has managed to successfully pull together encoding proposals,
based on current principles in Unicode, for dozens more scripts,
with little difficulty except for that inherent in extracting
information about rather poorly documented writing systems.

> it just seems to me that some
> extra ligature characters in the U+FB.. block would be useful.

Best practice, and near unanimous consensus in the Unicode Technical
Committee and among the correspondents on this list, would be
aligned with exactly the opposite opinion.

> In the
> light of this new evidence, I am wondering whether the decision not to
> encode any new ligatures in regular Unicode could possibly be looked at
> again.

As others have pointed out, "The Respectfully Experiment" did not
constitute new *evidence* of anything in this regard.

In any case, the UTC is quite unlikely to look at that decision again.

The exception that the UTC *has* considered recently was the Arabic
bismillah ligature, and the reason for doing so again was the result
of considering legacy practice. This thing exists in implemented
character encodings as a single encoded character. And furthermore,
it is used as a unitary symbol, in such a way that substituting out
an actual (long) string of Arabic letters and expecting the software
to ligate it correctly precisely in the contexts where it was being
used as a symbol, would place an unnecessary burden on both users and
on software implementations. That is *quite* different from the position
that claims that one, two, or dozens more Latin ligatures of two letters
need to be given standard Unicode encodings.

>if it cannot be done or would cause great anguish and
> arguments, well, that is that, forget it.

Good idea.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 02 2002 - 16:39:02 EDT