Re: Greek characters in IPA usage

From: Asmus Freytag (
Date: Fri Aug 14 2009 - 21:46:22 CDT

  • Next message: Nick Nicholas: "RE: Greek characters in IPA usage"

    Let me attempt to restate the problem more generically by looking at a
    few examples of ambiguous use of characters and the unification issues
    around them:

    Independent of how it is encoded, the Greek phi can be written in a
    straight-backed and a loopy form. The choice tends to depend on the
    typestyle for ordinary text. It depends on which variable you mean, when
    you write equations (it doesn't matter whether these are complex
    equations, or whether you just need to reference some quantities
    commonly abbreviated with one or the other form of phi - in the latter
    case, all discussions about the complexity of math layout really don't
    apply, so we can use math, or more specifically, the use of certain
    Greek letters to denote quantities, as an analogy). The phi is not the
    only example of an ambiguous character.

    For the hyphen, Unicode started out coding *three* characters: the
    ambiguous character, the one that's definitely a hyphen and the one
    that's definitely a minus sign (actually four, because there's the one
    that's definitely an en-dash). For Greek phi, Unicode and 10646 gave
    only two characters. The ambiguous (from the regular Greek alphabet) and
    the explicit technical one ("GREEK SYMBOL"). What is missing is a way to
    encode the unambiguous shape that's the contrast to the shape encoded as
    the technical symbol. As a result, there are some fonts that have the
    *same* glyph at both locations. Such fonts cannot be used for math (not
    even baby math) requiring these Greek symbols.

    This situation is entirely parallel to the IPA use of the Latin letter
    "a". The form with single bowl has been encoded as IPA specific, but the
    form with handle has not. There's only ambiguous 0061. As a result, any
    font that uses a single bowl a at location 0041 will be "unsuitable" for
    IPA. The situation for the Greek letters and IPA is similar, but not
    identical, because "Latinized" forms don't necessarily fall into the
    natural range of glyph variations for Greek letters (or you can at least
    argue that). But otherwise these cases are not so different.

    Whenever you aspire to full plain text support for IPA (so that your
    entire document can be in a single font), you will be limited by the
    case of the 'a' as well as that of the Greek letters. Both will limit
    the fonts that you can use for single-font mixed text/IPA documents.

    That's the problem statement. Next come the boundary conditions.

    If this discussion had taken place in 1988, or 1989, different boundary
    conditions would have applied, because at that time, there were neither
    existing data nor existing software using Unicode. Since then, this
    situation has changed, and provides an important boundary condition on
    the discussion.

    An important fact to be considered is that all Unicode encoded text for
    'a' with a handle or IPA Greek (or math loopy phi) has had to be encoded
    using the ordinary Latin resp. Greek characters. That has been going on
    for nearly 20 years now. If you suddenly switch to different
    *characters* you will get massive trouble in searching and sorting IPA
    text, because old and new text denoting the *same* pronunciation will
    suddenly have differently encoded strings. Since they will look 100%
    alike for some fonts (definitely true for the case of 'a' here), few
    authors will even know which character they were using. Security minded
    folks will go nuts at having even more perfect or near-perfect clones of
    ordinary letters added to the standard.

    So far the boundary conditions, now for possible solutions.

    There are two possibilities.

    1.) You can provide new character codes for all notational use of
    Latin/Greek letters where the glyphic repertoire is not identical to the
    natural range of glyphs that these characters exhibit when written as
    part of standard orthographies. If you do that, then please be complete,
    so that the pain does not come in repeated waves. That means addressing
    not just two Greek characters, but all the Latin and Greek characters
    that require special glyph design to harmonize with certain notations.
    The result will be that you can test fonts for their character
    repertoire to find out whether they support the new characters. You need
    to get all application vendors on board, so that sorting and searching
    can conflate the new characters with the old ones that had to be used
    before (and will continue to be used). Documents using the new
    characters will depend on fonts supporting those characters. Until then,
    they can only be exchanged in the context of font-embedding technologies
    (e.g. PDF).

    2.) You can provide a variation selector approach, where pairing a given
    variation selector with an *ambiguous* character will identify the
    preferred glyph shape. Well-written existing software would ignore the
    VS, and give you fallback behavior. All new documents would display at
    least as well as before, even in the absence of new fonts. Sort and
    search applications, if written to the existing specifications of
    Unicode, which require that a VS be ignored, would sort and search new
    and old IPA data alike. All you need to do to get the new glyphs is to
    have fonts supporting the Variation Sequences with new glyphs. You may
    need to work with display engine suppliers to enable such font features
    (but since such features are used for other scripts/contexts, this may
    not be as hard as it looks).

    Both of these possible solutions have a different mix of advantages and
    disadvantages. These need to be carefully weighed. In 2009, nearly 20
    years after the inception of the standard, backwards compatibility has
    to have a different importance than it had in 1989. That should enter
    this discussion and not be brushed off.


    This archive was generated by hypermail 2.1.5 : Fri Aug 14 2009 - 21:49:50 CDT