From: Chris Jacobs (c.t.m.jacobs@hccnet.nl)
Date: Sun Mar 16 2003 - 18:16:31 EST
----- Original Message -----
From: "Pim Blokland" <pblokland@planet.nl>
To: "Unicode mailing list" <unicode@unicode.org>
Sent: Sunday, March 16, 2003 1:18 PM
Subject: Re: Custom fonts (was: Tolkien wanta-be)
> Chris Jacobs schreef:
>
> > Mortbats code point 0034 is CANCER
> > Arial Unicode MS code point 0034 is DIGIT FOUR
> > Arial Unicode MS code point 264B is CANCER
>
> No. First of all, this is the wrong example. This has got nothing to
> do with private use characters. Cancer is not a private use
> character!
> I don't know the Mortbats font, but if this font has been designed
> in accordance with the rules, it may have codepoint U+264B at index
> #34. This should not cause problems or inconsistencies for the
> display system.
This is not an unicode font. I don't know what tables it has internally, but
if I have a text with a code point 0034 in it, and if I then display it
using this font, I get a CANCER glyph, not a DIGIT FOUR glyph.
A codepoint in itself does not specify a character.
Font + codepoint does specify a character.
Charset + codepoint also can specify a character.
codepoint 0034 could be anything.
codepoint U+0034 is uniquely DIGIT FOUR, since "U+" specifies the charset as
unicode.
If you specify both font and charset those should not conflict. If a webpage
has a codepoint 0034 in it you should not specify the text as both being in
font Mortbats and in charset unicode.
> Secondly, the problem with the PUA is that it should not, and will
> not, be subjected to regulations and guidelines. Font designers are
> always free to put anything they want in there - characters,
> transcoding hints, combining accents, what have you. That is what
> the PUA is there for!
>
> However, let's take a look at what you really want.
> Suppose we have two custom fonts, A and B, both with 256 (custom)
> characters, and you want to free yourself of the problems caused by
> any overlapping codepoints they may have.
> Do you want to be able to tell the system that if you output
> character U+E000, for example, it should use font A, and if you
> output character U+E100, it should use font B?
Say font A has on E000 an apple symbol, while font B has there a banana.
Say for this reason I gave font B an offset of 0100
Then on my system U+E000 in plaintext should indeed display an apple symbol
and U+E100 a banana symbol.
But if there are more fonts with an apple symbol U+E000 does not specify the
font to use.
> What exactly is the use of this?
It is the only consistent way to let plaintext utilities work properly in
the PUA
Of course such plaintext cannot as such be interchanged with other systems,
but if needed it could be converted to a format which can be interchanged,
the info if a certain codepoint represents an apple or a banana would be
still there.
> With a system like this, it would be impossible for, say, text files
> or HTML files on the Internet to display characters like this.
> Because what would you put in there to output, say, a Tinco? The
> writer of the HTML file doesn't know at what codepoint offset you
> have installed this Tengwar font.
Which Tengwar font?
He could specify the font as Tengwar Sindarin and use whatever codepoint Dan
Smith gave it.
Or he could specify Code2001 and use E000
If he used a Dan Smith's Tengwar font the charset should not be specified as
unicode.
> A better approach would be to find a way to agree on the *names* for
> the new characters.
> A scenario could be envisioned where an XML file (or even HTML)
> would contain the name of the font in a <FONT...> command; the
> system would read this info, load the font and extract its name
> table;
Do all fonts have a name table then?
> and after this point, the file can contain entries like
> "&Tinco;" which the system then can display, provided there is a
> character named "Tinco" in the font, of course!
> (Note: this may not be as straightforward as it sounds. For one
> thing, the <FONT > tag has been deprecated.
> And the names of
> characters in TrueType fonts are PostScript names, not HTML names,
> so that a character like "periodcentered" should be addressed as
> "·". But these are details, details...)
>
> Pim Blokland
>
> P.S.
>
>
This archive was generated by hypermail 2.1.5 : Sun Mar 16 2003 - 17:59:20 EST