From: Hans Aberg ([email protected])
Date: Thu Jun 02 2005 - 12:32:45 CDT
At 08:04 -0700 2005/06/02, Doug Ewell wrote:
>Glyphs in a font do not have to be associated 1-to-1 with Unicode code
>points. Indeed, they must not, if they are able to handle certain
>context-dependent scripts.
>
>There is no need to encode additional precomposed Latin ligatures, and
>they will not be encoded.
It seems me that, when considering a new glyph, one should strive to
figure if there is any semantic value to it; if not, it should
probably not be added. Figuring out the semantic value can sometime
be easy, sometimes difficult. For example, in some scripts, the
ligature "ae" is a separate letter, and it should obviously be added
based on that. In English, this ligature is exchangeable with the
letter combination "ae". So based on only that, it might first be
thought it should not be added. But, by the use of this ligature,
there is a communication of the etymology of the word in the
spelling, and this is a kind of semantic value. One then must judge
how important this value is, if it is enough for an addition.
So, switching to the glyph "fi", most readers would not even notice
it is there; so its semantic value is zero. It is only a rendering
technique. But, for example, in math, any glyph could in principle be
used. A mathematician could pick up the glyph "fi", or another
ligature, and assign it a special value. This is not likely though,
and even if some did that with some glyph, the semantic value of
doing so might not be considered be enough for an addition to the
Unicode character set. The mathematician in question could do what
mathematicians often have done in the past, pick another glyph, and
the writings would not suffer in semantic presentation. It could, of
course, happen that some glyphs become common and more acceptable,
and should be added based on that principle. One example are the
MATHEMATICAL DOUBLE-STRUCK letters. These originally only existed for
a few capital letters, used to free other letters in the case of
common, standard sets. There are often hated by typographers, it
seems, who find them ugly, but loved by mathematicians, because of
their usefulness. Gradually, mathematicians have wanted all English
letters added in this series. Because of this gradual realization,
these letters have some funny Unicode code points, not in adjacent
numbers.
So there are a number of principles and judgements involved, and they
may evolve slowly over time.
>You can bet that the keepers of the Unicode Standard will not
>"re-invent" it by renouncing the core technical principles that have
>guided them for 14 years. This kind of "thinking outside the box" is
>highly prized in marketing and industry invention, but it is a death
>blow for an interoperable standard.
The current Unicode character set is a mixed bag, rather empirically
made, than based on some general principles which are specialized in
the particular case at hand.
Clearly computer technology will evolve. This is very apparent in the
case of ligatures: The ligatures needed in the past are no longer
needed in more advanced rendering systems.
There are different ways to cope with such changes. One way is to
declare that the Unicode character set is as it is. Then one would
design new character set, of course, in some way upwards compatible
with current Unicode character set, but removing and streamlining the
parts that are not needed in a more advanced computer technology.
But it is also fully possible to admit such changes within the
current character set, by simply adding the new features, and mark
down the usage by the means of property fields. In some sense, this
is simpler, as one will want to have access to all Unicode characters
anyway, for backwards compatibility. In the case of the ligature
"fi", one might add a redirection to the letter combination "fi". In
the case of the ligature "ae", one could not do so, as it cannot
always be replaced. If there is some script where this change can
always be done, one can add information about that, say via special
script abstract characters, so that the redirection can take place.
This way, the ligatures already added to the Unicode set which are
only used for rendering purposes, can successively be put out of use,
resulting in a cleaner, more semantically oriented core.
Perhaps this picture I have described above too far away in the
future for some to focus at it. But it seems me that others are
already thinking along these lines. So it will happen; the question
is only how.
-- Hans Aberg
This archive was generated by hypermail 2.1.5 : Thu Jun 02 2005 - 12:35:39 CDT