Re: Custom fonts (was: Tolkien wanta-be)

From: Chris Jacobs (
Date: Sun Mar 16 2003 - 18:16:31 EST

  • Next message: Doug Ewell: "Re: Custom fonts (was: Tolkien wanta-be)"

    ----- Original Message -----
    From: "Pim Blokland" <>
    To: "Unicode mailing list" <>
    Sent: Sunday, March 16, 2003 1:18 PM
    Subject: Re: Custom fonts (was: Tolkien wanta-be)

    > Chris Jacobs schreef:
    > > Mortbats code point 0034 is CANCER
    > > Arial Unicode MS code point 0034 is DIGIT FOUR
    > > Arial Unicode MS code point 264B is CANCER
    > No. First of all, this is the wrong example. This has got nothing to
    > do with private use characters. Cancer is not a private use
    > character!
    > I don't know the Mortbats font, but if this font has been designed
    > in accordance with the rules, it may have codepoint U+264B at index
    > #34. This should not cause problems or inconsistencies for the
    > display system.

    This is not an unicode font. I don't know what tables it has internally, but
    if I have a text with a code point 0034 in it, and if I then display it
    using this font, I get a CANCER glyph, not a DIGIT FOUR glyph.

    A codepoint in itself does not specify a character.
    Font + codepoint does specify a character.

    Charset + codepoint also can specify a character.

    codepoint 0034 could be anything.
    codepoint U+0034 is uniquely DIGIT FOUR, since "U+" specifies the charset as

    If you specify both font and charset those should not conflict. If a webpage
    has a codepoint 0034 in it you should not specify the text as both being in
    font Mortbats and in charset unicode.

    > Secondly, the problem with the PUA is that it should not, and will
    > not, be subjected to regulations and guidelines. Font designers are
    > always free to put anything they want in there - characters,
    > transcoding hints, combining accents, what have you. That is what
    > the PUA is there for!
    > However, let's take a look at what you really want.
    > Suppose we have two custom fonts, A and B, both with 256 (custom)
    > characters, and you want to free yourself of the problems caused by
    > any overlapping codepoints they may have.
    > Do you want to be able to tell the system that if you output
    > character U+E000, for example, it should use font A, and if you
    > output character U+E100, it should use font B?

    Say font A has on E000 an apple symbol, while font B has there a banana.
    Say for this reason I gave font B an offset of 0100

    Then on my system U+E000 in plaintext should indeed display an apple symbol
    and U+E100 a banana symbol.
    But if there are more fonts with an apple symbol U+E000 does not specify the
    font to use.

    > What exactly is the use of this?

    It is the only consistent way to let plaintext utilities work properly in
    the PUA

    Of course such plaintext cannot as such be interchanged with other systems,
    but if needed it could be converted to a format which can be interchanged,
    the info if a certain codepoint represents an apple or a banana would be
    still there.

    > With a system like this, it would be impossible for, say, text files
    > or HTML files on the Internet to display characters like this.
    > Because what would you put in there to output, say, a Tinco? The
    > writer of the HTML file doesn't know at what codepoint offset you
    > have installed this Tengwar font.

    Which Tengwar font?

    He could specify the font as Tengwar Sindarin and use whatever codepoint Dan
    Smith gave it.
    Or he could specify Code2001 and use E000

    If he used a Dan Smith's Tengwar font the charset should not be specified as

    > A better approach would be to find a way to agree on the *names* for
    > the new characters.
    > A scenario could be envisioned where an XML file (or even HTML)
    > would contain the name of the font in a <FONT...> command; the
    > system would read this info, load the font and extract its name
    > table;

    Do all fonts have a name table then?

    > and after this point, the file can contain entries like
    > "&Tinco;" which the system then can display, provided there is a
    > character named "Tinco" in the font, of course!
    > (Note: this may not be as straightforward as it sounds. For one
    > thing, the <FONT > tag has been deprecated.
    > And the names of
    > characters in TrueType fonts are PostScript names, not HTML names,
    > so that a character like "periodcentered" should be addressed as
    > "&middot;". But these are details, details...)
    > Pim Blokland
    > P.S.

    This archive was generated by hypermail 2.1.5 : Sun Mar 16 2003 - 17:59:20 EST