From: Michael Everson (firstname.lastname@example.org)
Date: Fri Aug 14 2009 - 19:24:30 CDT
For many years I have thought about these last few stragglers, and I
have long been convinced that the right thing to do is to recognize
that the original unification was the mistake, and that it is high
time to disunify them, as we did finally with Kurdish Cyrillic Ԛ and
Ԝ from Latin Q and W. (These are new characters and your system fonts
may not have them; I use the splendid Everson Mono for my e-mail and
have no difficulty.)
Of course we all know that these derive ultimately from Greek printing
types. Go back to 1703, where Edward Lhuyd uses Χ and χ (the latter
with a wiggly glyph shape that is most annoying today) for Celtic [x].
Go back to Lepsius' Universal Alphabet, published in 1863. Yes, he
borrows Greek δ and γ and χ and θ, but he assimilates them into the
Latin alphabet: abcdδefgγhijkχlmnopqrstθuvwxyz.
Somewhere between 1900 and 1932 (I lack a particular reference book at
the moment though I have ordered it) x and χ were distinguished in the
Now we all know that modern notions of "script" and "character" are,
well, modern. But even in the 1949 Principles of the International
Phonetic Association it was quite clear that the borrowings from other
alphabets into the IPA were intended to be *naturalizations*, not just
temporary visiting. (In what follows, the comments in square brackets
are mine; the parenthetical "(in vertical form)" was the Handbook's.)
"The non-roman letters of the International Phonetic Alphabet have
been designed as far as possible to harmonise well with the roman
letters. The Association does not recognise makeshift letters; it
recognises only letters which have been carefully cut so as to be in
harmony with the other letters. For instance, the Greek letters
included in the International Alphabet are cut in roman adaptations.
Thus, since the ordinary shape of the Greek letter β [the common
italic upright glyph] does not harmonise with roman type in the
International Phonetic Alphabet it is given the form β [vertical like
Latin ß with strong serif in the descender]. And of the two forms of
Greek theta, θ and ϑ [both the italic upright glyphs], it has been
necessary to chose the first (in vertical form), since the second
cannot be made to harmonise with roman letters."
By "non-roman" of course there is no possibility of understanding that
the intent was to unify these characters as was done in Unicode 1.0.
I am sure that I have seen German sharp-s ß pressed into use for an
IPA beta because it was available. Perhaps a Greek beta was not.
Perhaps a Greek beta in the available font was unsuitable. In my view
the specific IPA beta is (and ought to be) as regularly different from
the Greek beta as IPA gamma is.
In fact, this was specified by David Abercrombie in a chapter on
Notation in Elements of General Phonetics.
"A good source from which letters can be borrowed is the Greek
alphabet, and β γ δ ε θ φ χ [the common italic upright glyphs,
the lunate epsilon], for example, have been made use for for centuries
in roman-based phonetic notations. Borrowed Greek letters are
sometimes redesigned so as to fit in with the general appearance of
roman letters. The preceding six characters, for example, have for
this reason been modified as follows: <β> ɣ ɛ <θ> ɸ <χ>.
As of Unicode 5.1, we have Latin delta ẟ at 1E9F.
To talk about design for a moment.
Latin IPA theta is supposed to be a tall, vertical o with a bar
through the middle. Some Greek fonts render 03B8 like this, but some
have it sloped, and some have it looped like 03D1. I know that math
people use those two characters distinctively, and in their
specialized fonts they can guarantee that 03B8 always looks the way
We don't really have that luxury in the "real" world of text, where
plain language text (even in Greek!) and IPA transcriptions co-exist.
I'm typesetting a book now in Baskerville, and I'm using Baskerville
IPA and Baskerville Greek in it. I'm glad I don't have to use IPA beta
because the Baskerville Greek beta is not correct. It's vertical. But
it's been designed for the other Greek letters, not for Latin. The
Greek theta could pass for the Latin, but the weight of the Greek chi
is exactly the reverse of the expected weight for the Latin chi: in
Latin the thick leg should be the northeast-southwest leg, but it is
the reverse for the Greek.
Indeed, the IPA chi is different from the long x used in Germanicist
dialectology; I should not name the character in http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3555.pdf
chi -- it is a stretched x, because its northwest-southeast leg is
thick, the opposite of what the IPA Handbook and Abercrombie specify.
And, of course, I can't sort IPA material containing beta, theta, or
VS1? No. That's pseudo-encoding. It's not going to guarantee better
font support -- while character disunification surely will, in time.
The problem is the false unification. That has some impact on legacy
data, but there are still probably more non-UCS IPA fonts in use than
there are Unicode-based IPA fonts. In the long run we will be better
off with the disunification.
I want my Greek fonts to be Greek, not compromises with Latin. And I
want my IPA fonts to be Latin, not compromises with Greek. Not for the
sake of three letters. It makes no sense. Added to the fact that I may
need to sort multilingual multiscript data -- and we end up with
EXACTLY the same argument we had for Kurdish KU and WE.
> It's not been a design point for the standard to support "single
> font" display of IPA intermixed with regular text. The idea was that
> the IPA would use a font suitable for IPA (i.e. using all the
> shapes as best for IPA) while all the text portions would use a font
> that had the other shapes (and therefore is unsuitable for IPA).
You're wrong here. It is perfectly possible to use an ordinary font
which contains IPA characters. Indeed it is hardly unusual to find
Latin text fonts (not so much display fonts of course) with IPA glyphs
in them. IPA isn't all that complex; there's no complex shaping
behaviour, at least not for the letters. So I don't believe that "the
design point" of the standard considered IPA to be "complex" in this
way. The only problem we have is the three letters borrowed from
Greek. That's it. A single font that supports Latin and Greek and
includes the IPA characters runs into this problem. That's what
Andreas has come up against. And so have I.
> Adding a new character code for the shape is a non-starter. It would
> make all existing IPA either invalid or ambiguous. Not something you
> can do 20 years after the first draft of the standard that contained
> both Greek and IPA.
Nonsense. You can certainly do so in a standard you expect to be used
for 80 or 100 years or more. This is a disunification that should have
already happened. LATIN SMALL LETTER DELTA got encoded (for phonetic
purposes, and it was used widely since Lepsius before ETH was used in
IPA) for Unicode 5.1. It wasn't too late for that. The Variation
Selector doesn't solve the problem of sorting, either. Certainly not
in a way that any ordinary user could avail of. Give us the three
characters, and we who make the fonts for the users won't have any
difficulties at all, and the UCA can sort them within Latin instead of
> I’d venture a guess that most linguists aren’t too concerned
> about the exact shape of the beta and theta for daily work, but are
> concerned only when publishing. (And, in many cases, it won’t be
> the linguist themself but rather the journal editor who cares.)
You're pretty far removed from the game, I think. I'm involved in
grammar and dictionary production now, and good typography and
harmonized fonts is a concern.
> When typesetting for print publication, one selects fonts to suit
> the specific requirements of the publication. Perhaps Greek-speaking
> linguists have problems. I don’t know how many linguistic papers
> are written and published in Greek.
Come on, Peter. Greek is often cited in anything to do with European
> I would think that, at some point, the CSS working group within W3C
> will want to embrace OpenType. And in OpenType there are a few
> different ways these glyph variants could be accommodated. But until
> then, Unicode variation sequences are a possible alternative – if
> there truly is a need.
The need has been there for years, and it's not about web fonts. The
sorting issue is not trivial (and was never addressed for Kurdish
Cyrillic until disunification made it moot).
> Don't forget there's chi, as well as beta and theta.
> As a hard-core IPAist, who type my phonological papers in a text
> editor using a single Unicode font (the necessary font switches for
> print being LaTeX markup), I would naturally prefer to have separate
> characters for IPA chi, beta, theta.
And this view is less uncommon (indeed more common) than Unicodists
might expect or imagine.
> However, I do have some qualms about this: why do I not also need a
> separate ipa "a" - I might be using a font in which the normal "a" is
> actually ɑ-shaped! Indeed, really I would like separate codepoints
> all IPA letters - but we know that would fail dismally in practice,
> even had it been implemented from the start.
In such a situation, you draw the a like ɑ (script a). and you draw
the ɑ (script a) like α (alpha).
> Given the situation as it is, I support the idea of variation
I don't. I support disunification.
> Now that the role of variation selectors has become clearer and more
> widespread (IVD), it makes sense to identify places where it
> belongs, but hasn't been used.
You're dreaming. CJK Ideographs have some problems, and the people who
have to deal with those will deal with the IVD. That's a whole
different world from ordinary Western phonetics scholarship.
> In both IPA and mathematical usage, there are characters which must
> not be rendered with some of the otherwise normal choices of glyphs,
> lest the notation become ambiguous. It's the "a" with handle at code
> point 0041 for IPA; for math if would be ensuring the looped form
> of "phi" at 03C5, so that it contrasts to 03D6, etc.
Um, I don't follow your reasoning here. IPA distinguishes 0041 and
0251. We distinguish Latin gamma 0263 from Greek gamma 03B3.
> So far, the official recommendation has simply been "don't use a
> font that's unsuitable for IPA (resp. unsuitable for math)".
Math is hard and complex to render. IPA should be easy. Certainly
these three letters do not merit trickery.
> This requirement is only known to the author of the text, so such
> text can only be created in those rich text formats where control
> over the font choice rests with the author of the document. (Which,
> notoriously, is not the fact for HTML, a rich text format where font
> substitution somewhere between author and reader is more than likely).
As a reader of IPA text I care if it is rendered correctly. I also
care that it is encoded correctly.
> With the use of variation selectors, in contrast, it would be
> possible to encode the requirement for the restricted range of
> glyphic variation in the text.
Trickery. Greek Beta is not Latin Beta and wasn't considered so by the
founders of the IPA. The mistake was in the Unicode (or ISO)
> If a font supported the variation sequence, it would be used,
> otherwise the software would be able to substitute a font that
> supported it, or was otherwise known to provide a suitable glyph.
Hackery. I don't think we want this.
> For mathematics, the problem is perhaps less acute, since it's
> already necessary for software (or document formats) to split the
> text into math and non-math runs of text in many instances, because
> the formatting and layout rules differ. But wherever individual
> variables are cited in the accompanying text such identification
> might break down, and variation selectors might still be useful.
I mix IPA orthography with other orthography constantly. No splitting.
> Furthermore, the use of variation selectors would enable a larger
> set of fonts to serve mixed use, wherever certain notations have
> arbitrary glyph requirements that only partially overlap with the
> natural glyph range for a given character. (There'd still be enough
> fonts that remain unsuitable for notational purposes, but there'd be
> a larger choice of "safe" plaintext fonts).
Completely false. Encoding Latin beta, theta, and chi would enable a
larger set of fonts to serve mixed use, because Latin beta, theta, and
chi can be there, glyphically and in sorting, distinct from the Greek
> By adding a few variation selectors, the encoding model could be
> made complete. Alternate character codes would cpntinue exist
> whenever symbols/entities are used in contrast with each other.
Baskerville Latin Chi and Baskerville Greek Chi should look different
from each other.
Michael Everson * http://www.evertype.com/
Michael Everson * http://www.evertype.com/
This archive was generated by hypermail 2.1.5 : Fri Aug 14 2009 - 19:29:22 CDT