Re: RE: Character identities

From: Jim Allan (
Date: Tue Oct 29 2002 - 20:53:59 EST

  • Next message: Adam Twardoch: "Re: RE: Character identities"

    The Old Icelandic character o; (Unicode U+01ED: LATIN SMALL LETTER O
    WITH OGONEK) is replaced in modern Icelandic by ö.

    Would it be proper therefore to represent U+00F6, the code point which
    Marco Cimarosti wants to use for o with circumflex e, also for o with

    In Icelandic they could be called the same character. Of course that
    only works of Icelandic. We could not use this font for German or
    English or French, unless we build some kind of recognition of language
    tags into it.

    In French the circumflex accent indicates an earlier superscript s over
    the vowel. So should we allow combining superscript s as a variant glyph
    for the circumflex? But what of French text containing transliterated
    Arabic names or Welsh names or transliterated classical Greek names
    which use a circumflex which never had such a meaning? Again we would
    need language tagging.

    The Old English and Middle English letter thorn (ž)is replaced in Modern
    English by the combination th. Would it make sense then for a modern
    font to represent U+00FE by a glyph showing th? Would it also make sense
    to replace the kinds of glyphs used for U+204A TIRONIAN SIGN ET with an
    ampersand? The meaning is exactly the same. But what if we want to used
    this font for Icelandic or Old English? Do we again need an intelligent
    font that understands language tagging?

    Do we now have different flavors of Unicocde, one for English, one for
    Icelandic, one for French, one for German ... ? What of other languages?

    A diaeresis used in the transliterated Classical names Peirithoüs and
    Menelaüs is not the same as a superscript e, though in German (and some
    other languages) sounds once indicated by supersript e over a vowel
    have been replaced by diaeresis over a vowel. If so, then a font which
    rendered any dieresis over u or o or a would be incorrect for classical
    names cited and also possibly for other foreign names. How would J.R.R.
    Tolkien's name Eärendil be rendered by such a font where the diaeresis
    indicates separate pronunciation of a, not an umlauted a?

    Surely it makes more sense that an author or advertising designer who
    wishes to use u with superscript e to use the Unicode method of u
    followed by a combining superscript e so that it will appear as desired
    in any font rather than by using a font change? Font changes should not
    change the orthography or spelling of the original but should represent
    transparently what the writer intended, and Unicode gives us a clear way
    to distinguish combining superscript e from combining diaeresis and
    combining superscript s from combining circumflex.

    Using the Unicode method makes far more sense than creating fonts that
    work for particular languages only, provided no foreign words or names
    appear, or which require language tagging.

    In most European languages ę and oe are ligatures at one time commonly
    used in names and technical words of Latin origin. Modern stylistic
    preference is to avoid these ligatures. However French uses oe for a
    particular sound, though the use of that ligature instead of oe was not
    considered important enough for oe to be generally available on French
    typewriters. Also both diagraphs were separate letters in Old English,
    whence the use of ę still in modern Danish and Icelandic. Should this
    modern convention be properly indicated in an intelligent font by using
    unconnected ae and oe for the these digraphs except where language
    tagging indicates Danish, Icelandic, or older Scandinavian use or Old
    English? Should we have to language tag Encyclopędia Britannica to be
    sure that ę appears in the name properly connected?

    In fact, the stylistic conventions are indicated not by font changes or
    tagging but by typing the appropriate characters.

    Should an English language font render ö as oe, so that Göthe appears
    automatically in the more normal English form Goethe?

    Marco's desire to use a font to indicate combining superscript einstead
    of the way Unicode wants it done seems prompted because currently most
    Unicode fonts do not currently support the combinining superscript
    characters and he wishes a fallback to normal diaeresis instead of to an
    undefined character indicator.

    This is a reasonable wish.

    In light of current Unicode support, the hack of identifying diaeresis
    with combinining superscript e makes sense.

    There has never been anything wrong with using a hack when required for
    a task at hand. But hacks of this kind that, if followed up widely in
    many fonts in many languages, would produce a chaos of interpretations
    and numerous fonts only suited for particular languages, filtering the
    text and not presenting what is there, without complex and otherwise
    unnecessary tagging.

    Surely this is not what Unicode should be?

    If a writer uses a long s in modern writing, whether quoting text of an
    earlier era or purposely being archaic, normal fonts should display a
    long s, not a short s on the grounds that it happens long s is not
    normally used in modern writing in Antiqua fonts.

    If a writer decides between using ü, ue, or u?? (u with combining
    superscript e), the font should leave the text alone.

    If you have a newer version of the Code 2000 font on your machine which
    contains the combining superscripts, then the superscript eappears
    correctly in newer browsers, even if you are using a different font for
    the base character. A diacritic from one font is placed over the base
    character of another.

    I can understand Marco not wishing to bother viewers with the demand to
    load a particular font and also knowing that dynamic downloading of a
    font will not work with every system or browser or with user settings of
    browsers. So use the hack for now. In two or three years, hopefully, it
    will not be necessary.

    Generally a font should not be correcting the text.

    The use of macron for dieresis is somewhat a different matter. If a
    particular style of German script uses a line for a diaeresis, then
    indeed the diaeresis in that script has fallen together in appearance
    with the macron. This would be especially so if a diaeresis was used
    over e and i (in foreign words and names). Representing diaeresis by a
    glyph of macron form would be no more of a hack then would be the use in
    an English script font of a p with an ascender, though presumably an
    Icelander would identify that as the letter ž, not p. (How ž itself
    should be presented in such a script font is problematical!)

    The main difficulty with identification of diaeresis and combinining
    superscript e is that the identification does not work universally, even
    within German, if foreign names or words appear. Even in German text,
    combining superscript e may not always correctly replace diaeresis.

    Jim Allan

    This archive was generated by hypermail 2.1.5 : Tue Oct 29 2002 - 21:46:22 EST