RE: Proposal (was: "Missing character" glyph)

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Mon Aug 05 2002 - 23:20:31 EDT


Peter,

Fujitsu implemented it in the font rendering engine. Many fonts such as
TrueType are encoded in Unicode even if the text is in code page. Thus Euro
would always be x'0020AC' even if you are running a Widows code page with
the Euro mod.

Using glyph positioning you can with the worst variation limit the number of
glyphs to 87. This would use the same technology as Indic and Southeast
Asian text.

Carl

  -----Original Message-----
  From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
Behalf Of Peter_Constable@sil.org
  Sent: Sunday, August 04, 2002 8:54 PM
  To: unicode@unicode.org
  Subject: RE: Proposal (was: "Missing character" glyph)

  On 08/04/2002 11:37:10 AM "Carl W. Brown" wrote:

>There are many ways to implement this but the principle is to provide a
>unique glyph for each different unrenderable character that can be trace
to
>the code point.
>
>If there has to be changes to the font engines, I do not think that they
>will be major.

  This might be a little trickier than you think.

  The code that's involved with processing cmap lookups would have to be
revised to detect all cmap lookups that fail. You can't simply look later
for all instances of GID 0 since by that time you have lost the character
info. But, the cmap lookup takes place early in the text rendering, prior to
any glyph reordering, substitution or positioning. In the OpenType/Uniscribe
architecture, it would happen fairly early in the Uniscribe processing, and
prior to any OT lookups. But, you can't just have a font contain all of
these glyphs, and then insert those glyphs into the glyph sequence at
cmap-lookup time: that would require every font to become many megabytes in
size -- that's a showstopper.

  I suppose you could create a font that had all of these glyphs, and then
do a font switch on those characters, but there's still a serious problem in
that you'd need this special font to cover each of the > 1 million
codepoints in Unicode's codespace, and TrueType fonts can only handle 64K
glyphs. I suppose you could create 17 of these fonts (each on the order of
10 MB - 20 MB in size), but do you really want to do that?

  The solution that this seems really to be looking for would be to simply
have the rendering engine algorithmically create those glyphs at display
time -- just generate the raster image for each one on the fly. But then,
we're having to get into the rendering process early on to get the character
info, and then link that to the very last step -- rasterisation. You're
going to end up inserting placeholder glyphs at cmap-lookup time, creating a
data structure in which you record the corresponding characters, then at
each stage of glyph processing (both Uniscribe transformations and OT
lookups) keep track of what impact if any the glyph transformations had on
those placeholder glyphs, updating the data structure accordingly. Oh, yes:
you'd also need to make sure that the metrics of the final rasterisation
match the metrics for the placeholder glyph(s), otherwise you could end up
with lots of collisions. That all sounds incredibly messy to me.

  I guess a way around some of all that mess would be to leave everything to
near the end: get the final sequence of positioned glyphs, look for
instances of GID 0 then apply basically the same process used for
hit-testing to go from a screen position back to the underlying position in
the character sequence, then use that inferred character info to
algorithmally generate a raster image during rasterisation. But this assumes
that a character position can always be determined from the positioned glyph
sequence for any font. I don't know enough to comment on whether that's a
safe assumption in general or not.

  Proposing a new display-notdef-glyph character is something that could be
done without consulting the font technologies industry ahead of time since
the implementation for supporting it doesn't involve any change in font
technologies. If this alternative Carl has proposed is to be considered
instead, however, then I really think the Unicode folks really need to get
input from the font technologies sector before assuming that changes to font
engines would not be major.

  - Peter

  --------------------------------------------------------------------------
-
  Peter Constable

  Non-Roman Script Initiative, SIL International
  7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
  Tel: +1 972 708 7485
  E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Mon Aug 05 2002 - 21:48:09 EDT