Re: Custom fonts (was: Tolkien wanta-be)

From: Doug Ewell (
Date: Wed Mar 19 2003 - 02:32:46 EST

  • Next message: Lateef Sagar: "Characters not in Sindhi"

    Pim Blokland <pblokland at planet dot nl> wrote:

    > Now what you do in the privacy of your own home is none of our
    > concern, but when communicating with the outside world, there are
    > certain rules and guidelines you should abide by. And one of those
    > guidelines is a plaintext file should not have PUA characters in
    > them, unless its author also specifies it should be displayed using
    > a certain font.

    Not exactly. There must be an agreement between sender and receiver
    (author and reader, pitcher and catcher, whatever) as to how the PUA
    characters are to be interpreted. This doesn't necessarily involve a
    specific font, just one that follows that particular PUA interpretation.

    For example, my invented script has been proposed for ConScript in the
    range from U+E690 through U+E6CF. It's supported in Code2000 by James
    Kass using this range. In the future, a font by Michael Everson will
    also be available, using the same range, and then there will be (at
    least) two fonts supporting the proposed ConScript range. At that time,
    it will no longer be crucial whether users are using Code2000 or
    Michael's font; both will display the same text, and the only
    differences will be in aesthetics and style (as it should be with

    > Lastly, I must say I think it's a pity that the suggestion I made
    > yesterday has been ignored so quietly. You know, in a HTML
    > environment, to retrieve names for characters from the font file
    > itself, to relieve the author from the task of having to enter
    > numerical values.
    > For an example, suppose you have a font named "Tengwar Quenya", with
    > a character named "hwesta" at U+E00B, you could use it in an XML
    > file by defining an entity, <!ENTITY hwesta "&#xE00B;">. Now my
    > suggestion was the browser program which displays this file should
    > be able to look at the font information in the XML file, open the
    > font file and retrieve the names of all characters in it, so it can
    > show the "&hwesta;" character (and all other characters) without
    > needing a long list of ENTITY entries in the XML.

    There have been lots of attempts to define short mnemonic names or
    "entities" for Unicode. SGML names are one. The "i18nrep
    repertoiremap," originally defined in RFC 1345 and more recently used in
    ISO/IEC TR 14652, is another. These schemes work well for a relatively
    small number of characters, say a thousand, but become unwieldy and
    anti-mnemonic when applied to a larger set of characters. There simply
    aren't enough short mnemonic names to go around.

    It's possible that the name "hwesta" might catch on for this particular
    Tengwar letter, and then the scenario Pim describes might work (although
    asking a browser to interpret the internal structure of a font file
    seems excessive to me). But the same mechanism is less likely to work
    on other scripts, where character names are less likely to be easily,
    uniquely abbreviated (e.g. many scripts have a character called KA or

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Wed Mar 19 2003 - 03:22:22 EST