From: Asmus Freytag (email@example.com)
Date: Fri Aug 14 2009 - 21:46:22 CDT
Let me attempt to restate the problem more generically by looking at a
few examples of ambiguous use of characters and the unification issues
Independent of how it is encoded, the Greek phi can be written in a
straight-backed and a loopy form. The choice tends to depend on the
typestyle for ordinary text. It depends on which variable you mean, when
you write equations (it doesn't matter whether these are complex
equations, or whether you just need to reference some quantities
commonly abbreviated with one or the other form of phi - in the latter
case, all discussions about the complexity of math layout really don't
apply, so we can use math, or more specifically, the use of certain
Greek letters to denote quantities, as an analogy). The phi is not the
only example of an ambiguous character.
For the hyphen, Unicode started out coding *three* characters: the
ambiguous character, the one that's definitely a hyphen and the one
that's definitely a minus sign (actually four, because there's the one
that's definitely an en-dash). For Greek phi, Unicode and 10646 gave
only two characters. The ambiguous (from the regular Greek alphabet) and
the explicit technical one ("GREEK SYMBOL"). What is missing is a way to
encode the unambiguous shape that's the contrast to the shape encoded as
the technical symbol. As a result, there are some fonts that have the
*same* glyph at both locations. Such fonts cannot be used for math (not
even baby math) requiring these Greek symbols.
This situation is entirely parallel to the IPA use of the Latin letter
"a". The form with single bowl has been encoded as IPA specific, but the
form with handle has not. There's only ambiguous 0061. As a result, any
font that uses a single bowl a at location 0041 will be "unsuitable" for
IPA. The situation for the Greek letters and IPA is similar, but not
identical, because "Latinized" forms don't necessarily fall into the
natural range of glyph variations for Greek letters (or you can at least
argue that). But otherwise these cases are not so different.
Whenever you aspire to full plain text support for IPA (so that your
entire document can be in a single font), you will be limited by the
case of the 'a' as well as that of the Greek letters. Both will limit
the fonts that you can use for single-font mixed text/IPA documents.
That's the problem statement. Next come the boundary conditions.
If this discussion had taken place in 1988, or 1989, different boundary
conditions would have applied, because at that time, there were neither
existing data nor existing software using Unicode. Since then, this
situation has changed, and provides an important boundary condition on
An important fact to be considered is that all Unicode encoded text for
'a' with a handle or IPA Greek (or math loopy phi) has had to be encoded
using the ordinary Latin resp. Greek characters. That has been going on
for nearly 20 years now. If you suddenly switch to different
*characters* you will get massive trouble in searching and sorting IPA
text, because old and new text denoting the *same* pronunciation will
suddenly have differently encoded strings. Since they will look 100%
alike for some fonts (definitely true for the case of 'a' here), few
authors will even know which character they were using. Security minded
folks will go nuts at having even more perfect or near-perfect clones of
ordinary letters added to the standard.
So far the boundary conditions, now for possible solutions.
There are two possibilities.
1.) You can provide new character codes for all notational use of
Latin/Greek letters where the glyphic repertoire is not identical to the
natural range of glyphs that these characters exhibit when written as
part of standard orthographies. If you do that, then please be complete,
so that the pain does not come in repeated waves. That means addressing
not just two Greek characters, but all the Latin and Greek characters
that require special glyph design to harmonize with certain notations.
The result will be that you can test fonts for their character
repertoire to find out whether they support the new characters. You need
to get all application vendors on board, so that sorting and searching
can conflate the new characters with the old ones that had to be used
before (and will continue to be used). Documents using the new
characters will depend on fonts supporting those characters. Until then,
they can only be exchanged in the context of font-embedding technologies
2.) You can provide a variation selector approach, where pairing a given
variation selector with an *ambiguous* character will identify the
preferred glyph shape. Well-written existing software would ignore the
VS, and give you fallback behavior. All new documents would display at
least as well as before, even in the absence of new fonts. Sort and
search applications, if written to the existing specifications of
Unicode, which require that a VS be ignored, would sort and search new
and old IPA data alike. All you need to do to get the new glyphs is to
have fonts supporting the Variation Sequences with new glyphs. You may
need to work with display engine suppliers to enable such font features
(but since such features are used for other scripts/contexts, this may
not be as hard as it looks).
Both of these possible solutions have a different mix of advantages and
disadvantages. These need to be carefully weighed. In 2009, nearly 20
years after the inception of the standard, backwards compatibility has
to have a different importance than it had in 1989. That should enter
this discussion and not be brushed off.
This archive was generated by hypermail 2.1.5 : Fri Aug 14 2009 - 21:49:50 CDT