Re: Custom fonts

From: Pim Blokland (
Date: Wed Mar 19 2003 - 17:44:42 EST

  • Next message: Magda Danish \(Unicode\): "FW: Web Form: Other Question - NCR 96 codepage to Unicode Codepage"

    Doug Ewell wrote:

    > There have been lots of attempts to define short mnemonic names or
    > "entities" for Unicode. SGML names are one. The "i18nrep
    > repertoiremap," originally defined in RFC 1345 and more recently
    used in
    > ISO/IEC TR 14652, is another. These schemes work well for a
    > small number of characters, say a thousand, but become unwieldy
    > anti-mnemonic when applied to a larger set of characters. There
    > aren't enough short mnemonic names to go around.

    Yes, but my suggestion was to read character names on a per-font
    basis. So when a HTML file contains something like <FONT
    face="Symbol"> only the Symbol font has to be read and scanned for
    names. This completely nullifies the need to have the complete list
    of entity names in memory at all times.

    > (..) the scenario Pim describes might work (although
    > asking a browser to interpret the internal structure of a font
    > seems excessive to me). But the same mechanism is less likely to

    Is it really so excessive? Browsers already have to retrieve a lot
    of font info, like the line height and such. Why not character
    names? I could easily envision a function call like

    and John Hudson wrote:

    > You are presuming that all fonts contain name strings for glyphs.

    As a matter of fact, I know they don't, and even if they do, it's
    often names like "uniF10C" which aren't really very helpful.
    Naturally my scheme would break down if the given fonts do not
    contain the character names we expect. But, in the current scheme
    that's in use, things go wrong if the fonts don't contain the
    numerical indexes we expect! As a real-life example, if the browser
    program assumes a character like "INTEGRAL EXTENSION" to be in the
    Symbol font at codepoint U+23AE, but its index in the font really is
    U+F0F4, this does not work out. Now if the browser were to look for
    a character named "integralex" in a font, the character would have
    been found, no matter what codepoint it was on.

    Anyway, I'm aware this is getting a bit out of scope for the Unicode
    Mailing List, so I propose we leave it at this. Unless someone is
    willing to rewrite an Internet browser, then I'll be happy to
    continue this discussion by private e-mail...

    Pim Blokland

    This archive was generated by hypermail 2.1.5 : Wed Mar 19 2003 - 18:29:32 EST