RE: Proposal (was: "Missing character" glyph)

From: Peter_Constable@sil.org
Date: Sun Aug 04 2002 - 23:54:25 EDT


On 08/04/2002 11:37:10 AM "Carl W. Brown" wrote:

>There are many ways to implement this but the principle is to provide a
>unique glyph for each different unrenderable character that can be trace
to
>the code point.
>
>If there has to be changes to the font engines, I do not think that they
>will be major.

This might be a little trickier than you think.

The code that's involved with processing cmap lookups would have to be
revised to detect all cmap lookups that fail. You can't simply look later
for all instances of GID 0 since by that time you have lost the character
info. But, the cmap lookup takes place early in the text rendering, prior
to any glyph reordering, substitution or positioning. In the
OpenType/Uniscribe architecture, it would happen fairly early in the
Uniscribe processing, and prior to any OT lookups. But, you can't just
have a font contain all of these glyphs, and then insert those glyphs into
the glyph sequence at cmap-lookup time: that would require every font to
become many megabytes in size -- that's a showstopper.

I suppose you could create a font that had all of these glyphs, and then
do a font switch on those characters, but there's still a serious problem
in that you'd need this special font to cover each of the > 1 million
codepoints in Unicode's codespace, and TrueType fonts can only handle 64K
glyphs. I suppose you could create 17 of these fonts (each on the order of
10 MB - 20 MB in size), but do you really want to do that?

The solution that this seems really to be looking for would be to simply
have the rendering engine algorithmically create those glyphs at display
time -- just generate the raster image for each one on the fly. But then,
we're having to get into the rendering process early on to get the
character info, and then link that to the very last step -- rasterisation.
You're going to end up inserting placeholder glyphs at cmap-lookup time,
creating a data structure in which you record the corresponding
characters, then at each stage of glyph processing (both Uniscribe
transformations and OT lookups) keep track of what impact if any the glyph
transformations had on those placeholder glyphs, updating the data
structure accordingly. Oh, yes: you'd also need to make sure that the
metrics of the final rasterisation match the metrics for the placeholder
glyph(s), otherwise you could end up with lots of collisions. That all
sounds incredibly messy to me.

I guess a way around some of all that mess would be to leave everything to
near the end: get the final sequence of positioned glyphs, look for
instances of GID 0 then apply basically the same process used for
hit-testing to go from a screen position back to the underlying position
in the character sequence, then use that inferred character info to
algorithmally generate a raster image during rasterisation. But this
assumes that a character position can always be determined from the
positioned glyph sequence for any font. I don't know enough to comment on
whether that's a safe assumption in general or not.

Proposing a new display-notdef-glyph character is something that could be
done without consulting the font technologies industry ahead of time since
the implementation for supporting it doesn't involve any change in font
technologies. If this alternative Carl has proposed is to be considered
instead, however, then I really think the Unicode folks really need to get
input from the font technologies sector before assuming that changes to
font engines would not be major.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Sun Aug 04 2002 - 22:07:19 EDT