Re: Why people still want to encode precomposed letters

From: Jukka K. Korpela (
Date: Sun Nov 16 2008 - 12:12:50 CST

  • Next message: Andrew Cunningham: "Re: Why people still want to encode precomposed letters"

    David Starner wrote:

    > On Sat, Nov 15, 2008 at 5:55 PM, Jukka K. Korpela
    > <> wrote:
    >> Too bad if you really need those characters. But encoding new
    >> letters with diacritics as code points wouldn't help. Even if it
    >> were possible to add them into Unicode, it would take many many
    >> years before they have been added there and implemented widely in
    >> fonts that are available on people's computers.
    > I don't believe that.

    Consider, for example, the case of DIAMETER SIGN U+2300, a fairly common
    technical symbol. It has been in Unicode since 1993. Font support currently
    consists of Arial Unicode MS, generally available only when MS Office has
    been installed (and maybe not even then), and a handful of special fonts
    that most people never saw. Note that Arial Unicode MS is really a single
    typeface, lacking italics and bold.

    If that’s how far we’ve got in 15 years with a common, language-independent
    symbol, how fast do you expect to make progress with language-specific
    precomposed characters, typically for languages spoken by relative small
    amount of people?

    > I'm seeing Latin Extended C on Alanwood's page just fine,

    I am not. I can see about half of the characters – and my computer has a
    fairly good repertoire of extensive fonts, including Code2000 and Doulos
    SIL. The problem is that currently I am using, as most of the world always
    uses, Internet Explorer as web browser. Actually, even on Firefox, two of
    the characters are displayed as dummies, namely as ”last resort” glyphs
    (specially, as glyphs containing the Unicode number in a square).

    Of course, part of the problem is caused by IE’s inability to use backup
    fonts for characters not found in the primary font. But on the other hand,
    this would not be a problem if the primary font contained a sufficient
    amount of characters. And mixing fonts is always a problem, often a big
    problem, especially with letters – consider what happens when a word
    contains a letter from another (and possibly quite different) font.

    > and it would be trivial to ask people who want to see those
    > characters to add one of several that support it.

    One of several fonts, you mean, I suppose. Well, it might be trivial to ask
    people to do such things, but to begin with, a huge number of people can’t
    install any fonts on their computer at work or on a computer in some public
    premises that they are using. Moreover, most ordinary users of computers don’t
    even know how to install a font. And could you list down the fonts that
    support all of Latin Extended-C, for example? I don’t think there are many
    of them, and some of them might not qualify for any general use (due to
    their typographic characteristics).

    > How many characters
    > that would have been encoded in Unicode 5.0 if it hadn't been for that
    > rule have several fonts that well-support them?

    I cannot quite interpret (or actually even parse) the question. It seems to
    postulate a rule that a character cannot be added to Unicode unless there
    are several fonts that “well-support” it. Is there really such a rule?

    What I wrote was simply an argument saying that even if Unicode principles
    were changed to allow new characters with diacritic marks added as
    precomposed – which is an unrealistic assumption, made only for the sake of
    argument – this would not magically make such character useable in practice.

    It probably needs to be added that no law prevents implementors from making
    rendering systems that map combinations of letters and diacritic marks,
    presented as two code points, to a single glyph.


    This archive was generated by hypermail 2.1.5 : Sun Nov 16 2008 - 12:16:38 CST