Re: Greek characters in IPA usage

From: Michael Everson (
Date: Fri Aug 14 2009 - 19:24:30 CDT

  • Next message: Asmus Freytag: "Re: Greek characters in IPA usage"

    For many years I have thought about these last few stragglers, and I
    have long been convinced that the right thing to do is to recognize
    that the original unification was the mistake, and that it is high
    time to disunify them, as we did finally with Kurdish Cyrillic Ԛ and
    Ԝ from Latin Q and W. (These are new characters and your system fonts
    may not have them; I use the splendid Everson Mono for my e-mail and
    have no difficulty.)

    Of course we all know that these derive ultimately from Greek printing
    types. Go back to 1703, where Edward Lhuyd uses Χ and χ (the latter
    with a wiggly glyph shape that is most annoying today) for Celtic [x].

    Go back to Lepsius' Universal Alphabet, published in 1863. Yes, he
    borrows Greek δ and γ and χ and θ, but he assimilates them into the
    Latin alphabet: abcdδefgγhijkχlmnopqrstθuvwxyz.

    Somewhere between 1900 and 1932 (I lack a particular reference book at
    the moment though I have ordered it) x and χ were distinguished in the

    Now we all know that modern notions of "script" and "character" are,
    well, modern. But even in the 1949 Principles of the International
    Phonetic Association it was quite clear that the borrowings from other
    alphabets into the IPA were intended to be *naturalizations*, not just
    temporary visiting. (In what follows, the comments in square brackets
    are mine; the parenthetical "(in vertical form)" was the Handbook's.)

    "The non-roman letters of the International Phonetic Alphabet have
    been designed as far as possible to harmonise well with the roman
    letters. The Association does not recognise makeshift letters; it
    recognises only letters which have been carefully cut so as to be in
    harmony with the other letters. For instance, the Greek letters
    included in the International Alphabet are cut in roman adaptations.
    Thus, since the ordinary shape of the Greek letter β [the common
    italic upright glyph] does not harmonise with roman type in the
    International Phonetic Alphabet it is given the form β [vertical like
    Latin ß with strong serif in the descender]. And of the two forms of
    Greek theta, θ and ϑ [both the italic upright glyphs], it has been
    necessary to chose the first (in vertical form), since the second
    cannot be made to harmonise with roman letters."

    By "non-roman" of course there is no possibility of understanding that
    the intent was to unify these characters as was done in Unicode 1.0.

    I am sure that I have seen German sharp-s ß pressed into use for an
    IPA beta because it was available. Perhaps a Greek beta was not.
    Perhaps a Greek beta in the available font was unsuitable. In my view
    the specific IPA beta is (and ought to be) as regularly different from
    the Greek beta as IPA gamma is.

    In fact, this was specified by David Abercrombie in a chapter on
    Notation in Elements of General Phonetics.

    "A good source from which letters can be borrowed is the Greek
    alphabet, and β γ δ ε θ φ χ [the common italic upright glyphs,
    the lunate epsilon], for example, have been made use for for centuries
    in roman-based phonetic notations. Borrowed Greek letters are
    sometimes redesigned so as to fit in with the general appearance of
    roman letters. The preceding six characters, for example, have for
    this reason been modified as follows: <β> ɣ ɛ <θ> ɸ <χ>.

    As of Unicode 5.1, we have Latin delta ẟ at 1E9F.

    To talk about design for a moment.

    Latin IPA theta is supposed to be a tall, vertical o with a bar
    through the middle. Some Greek fonts render 03B8 like this, but some
    have it sloped, and some have it looped like 03D1. I know that math
    people use those two characters distinctively, and in their
    specialized fonts they can guarantee that 03B8 always looks the way
    they want.

    We don't really have that luxury in the "real" world of text, where
    plain language text (even in Greek!) and IPA transcriptions co-exist.
    I'm typesetting a book now in Baskerville, and I'm using Baskerville
    IPA and Baskerville Greek in it. I'm glad I don't have to use IPA beta
    because the Baskerville Greek beta is not correct. It's vertical. But
    it's been designed for the other Greek letters, not for Latin. The
    Greek theta could pass for the Latin, but the weight of the Greek chi
    is exactly the reverse of the expected weight for the Latin chi: in
    Latin the thick leg should be the northeast-southwest leg, but it is
    the reverse for the Greek.

    Indeed, the IPA chi is different from the long x used in Germanicist
    dialectology; I should not name the character in
      chi -- it is a stretched x, because its northwest-southeast leg is
    thick, the opposite of what the IPA Handbook and Abercrombie specify.

    And, of course, I can't sort IPA material containing beta, theta, or
    chi correctly.

    VS1? No. That's pseudo-encoding. It's not going to guarantee better
    font support -- while character disunification surely will, in time.
    The problem is the false unification. That has some impact on legacy
    data, but there are still probably more non-UCS IPA fonts in use than
    there are Unicode-based IPA fonts. In the long run we will be better
    off with the disunification.

    I want my Greek fonts to be Greek, not compromises with Latin. And I
    want my IPA fonts to be Latin, not compromises with Greek. Not for the
    sake of three letters. It makes no sense. Added to the fact that I may
    need to sort multilingual multiscript data -- and we end up with
    EXACTLY the same argument we had for Kurdish KU and WE.

    Asmus said:

    > It's not been a design point for the standard to support "single
    > font" display of IPA intermixed with regular text. The idea was that
    > the IPA would use a font suitable for IPA (i.e. using all the
    > shapes as best for IPA) while all the text portions would use a font
    > that had the other shapes (and therefore is unsuitable for IPA).

    You're wrong here. It is perfectly possible to use an ordinary font
    which contains IPA characters. Indeed it is hardly unusual to find
    Latin text fonts (not so much display fonts of course) with IPA glyphs
    in them. IPA isn't all that complex; there's no complex shaping
    behaviour, at least not for the letters. So I don't believe that "the
    design point" of the standard considered IPA to be "complex" in this
    way. The only problem we have is the three letters borrowed from
    Greek. That's it. A single font that supports Latin and Greek and
    includes the IPA characters runs into this problem. That's what
    Andreas has come up against. And so have I.

    > Adding a new character code for the shape is a non-starter. It would
    > make all existing IPA either invalid or ambiguous. Not something you
    > can do 20 years after the first draft of the standard that contained
    > both Greek and IPA.

    Nonsense. You can certainly do so in a standard you expect to be used
    for 80 or 100 years or more. This is a disunification that should have
    already happened. LATIN SMALL LETTER DELTA got encoded (for phonetic
    purposes, and it was used widely since Lepsius before ETH was used in
    IPA) for Unicode 5.1. It wasn't too late for that. The Variation
    Selector doesn't solve the problem of sorting, either. Certainly not
    in a way that any ordinary user could avail of. Give us the three
    characters, and we who make the fonts for the users won't have any
    difficulties at all, and the UCA can sort them within Latin instead of
    within Greek.

    Peter said:

    > I’d venture a guess that most linguists aren’t too concerned
    > about the exact shape of the beta and theta for daily work, but are
    > concerned only when publishing. (And, in many cases, it won’t be
    > the linguist themself but rather the journal editor who cares.)

    You're pretty far removed from the game, I think. I'm involved in
    grammar and dictionary production now, and good typography and
    harmonized fonts is a concern.

    > When typesetting for print publication, one selects fonts to suit
    > the specific requirements of the publication. Perhaps Greek-speaking
    > linguists have problems. I don’t know how many linguistic papers
    > are written and published in Greek.

    Come on, Peter. Greek is often cited in anything to do with European
    or Indo-European.

    > I would think that, at some point, the CSS working group within W3C
    > will want to embrace OpenType. And in OpenType there are a few
    > different ways these glyph variants could be accommodated. But until
    > then, Unicode variation sequences are a possible alternative – if
    > there truly is a need.

    The need has been there for years, and it's not about web fonts. The
    sorting issue is not trivial (and was never addressed for Kurdish
    Cyrillic until disunification made it moot).

    Julian said:

    > Don't forget there's chi, as well as beta and theta.
    > As a hard-core IPAist, who type my phonological papers in a text
    > editor using a single Unicode font (the necessary font switches for
    > print being LaTeX markup), I would naturally prefer to have separate
    > characters for IPA chi, beta, theta.

    And this view is less uncommon (indeed more common) than Unicodists
    might expect or imagine.

    > However, I do have some qualms about this: why do I not also need a
    > separate ipa "a" - I might be using a font in which the normal "a" is
    > actually ɑ-shaped! Indeed, really I would like separate codepoints
    > for
    > all IPA letters - but we know that would fail dismally in practice,
    > even had it been implemented from the start.

    In such a situation, you draw the a like ɑ (script a). and you draw
    the ɑ (script a) like α (alpha).

    > Given the situation as it is, I support the idea of variation
    > selectors.

    I don't. I support disunification.

    Asmus said:

    > Now that the role of variation selectors has become clearer and more
    > widespread (IVD), it makes sense to identify places where it
    > belongs, but hasn't been used.

    You're dreaming. CJK Ideographs have some problems, and the people who
    have to deal with those will deal with the IVD. That's a whole
    different world from ordinary Western phonetics scholarship.

    > In both IPA and mathematical usage, there are characters which must
    > not be rendered with some of the otherwise normal choices of glyphs,
    > lest the notation become ambiguous. It's the "a" with handle at code
    > point 0041 for IPA; for math if would be ensuring the looped form
    > of "phi" at 03C5, so that it contrasts to 03D6, etc.

    Um, I don't follow your reasoning here. IPA distinguishes 0041 and
    0251. We distinguish Latin gamma 0263 from Greek gamma 03B3.

    > So far, the official recommendation has simply been "don't use a
    > font that's unsuitable for IPA (resp. unsuitable for math)".

    Math is hard and complex to render. IPA should be easy. Certainly
    these three letters do not merit trickery.

    > This requirement is only known to the author of the text, so such
    > text can only be created in those rich text formats where control
    > over the font choice rests with the author of the document. (Which,
    > notoriously, is not the fact for HTML, a rich text format where font
    > substitution somewhere between author and reader is more than likely).

    As a reader of IPA text I care if it is rendered correctly. I also
    care that it is encoded correctly.

    > With the use of variation selectors, in contrast, it would be
    > possible to encode the requirement for the restricted range of
    > glyphic variation in the text.

    Trickery. Greek Beta is not Latin Beta and wasn't considered so by the
    founders of the IPA. The mistake was in the Unicode (or ISO)

    > If a font supported the variation sequence, it would be used,
    > otherwise the software would be able to substitute a font that
    > supported it, or was otherwise known to provide a suitable glyph.

    Hackery. I don't think we want this.

    > For mathematics, the problem is perhaps less acute, since it's
    > already necessary for software (or document formats) to split the
    > text into math and non-math runs of text in many instances, because
    > the formatting and layout rules differ. But wherever individual
    > variables are cited in the accompanying text such identification
    > might break down, and variation selectors might still be useful.

    I mix IPA orthography with other orthography constantly. No splitting.

    > Furthermore, the use of variation selectors would enable a larger
    > set of fonts to serve mixed use, wherever certain notations have
    > arbitrary glyph requirements that only partially overlap with the
    > natural glyph range for a given character. (There'd still be enough
    > fonts that remain unsuitable for notational purposes, but there'd be
    > a larger choice of "safe" plaintext fonts).

    Completely false. Encoding Latin beta, theta, and chi would enable a
    larger set of fonts to serve mixed use, because Latin beta, theta, and
    chi can be there, glyphically and in sorting, distinct from the Greek

    > By adding a few variation selectors, the encoding model could be
    > made complete. Alternate character codes would cpntinue exist
    > whenever symbols/entities are used in contrast with each other.

    Baskerville Latin Chi and Baskerville Greek Chi should look different
    from each other.

    Michael Everson *


    Michael Everson *

    This archive was generated by hypermail 2.1.5 : Fri Aug 14 2009 - 19:29:22 CDT