Re: Revised proposal for "Missing character" glyph

From: Kenneth Whistler (
Date: Mon Aug 26 2002 - 17:01:09 EDT

[Resend of a response which got eaten by the Unicode email
during the system maintenance last week. Carl already responded
to me on this, but others may not have seen what he was
responding to. --Ken]

> Proposed unknown and missing character representation. This would be an
> alternate to method currently described in 5.3.
> The missing or unknown character would be represented as a series of
> vertical hex digit pairs for each byte of the character.

The problem I have with this is that is seems to be an overengineered
approach that conflates two issues:

  a. What does a font do when requested to display a character
     (or sequence) for which it has no glyph.

  b. What does a user do to diagnose text content that may be
     causing a rendering failure.

For the first problem, we already have a widespread approach that
seems adequate. And other correspondents on this topic have pointed
out that the particular approach of displaying up hex numbers for
characters may pose technical difficulties for at least some font

> This representation would be recognized by untrained people as unrenderable
> data or garbage. So it would serve the same function as a missing glyph
> character except that it would be different from normal glyphs so that they
> would know that something was wrong and the text did not just happen to have
> funny characters.

I don't see any particular problem in training people to recognize when
they are seeing their fonts' notdef glyphs. The whole concept of "seeing
little boxes where the characters should be" is not hard to explain to
people -- even to people who otherwise have difficulty with a lot of
computer abstractions.

Things will be better-behaved when applications finally get past the
related but worse problem of screwing up the character encodings --
which results in the more typical misdisplay: lots of recognizable
glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that
must be another piece of Korean spam mail in my mail tray.)

> It would aid people in finding the problem and for people with Unicode books
> the text would be decipherable. If the information was truly critical they
> could have the text deciphered.

Rather than trying to engineer a questionable solution into the fonts,
I'd like to step back and ask what would better serve the user
in such circumstances.

And an approach which strikes me as a much more useful and extensible
way to deal with this would be the concept of a "What's This?"
text accessory. Essentially a small tool that a user could select
a piece of text with (think of it like a little magnifying glass,
if you will), which will then pop up the contents selected, deconstructed
into its character sequence explicitly. Limited versions of such things
exist already -- such as the tooltip-like popup windows for Asmus'
Unibook program, which give attribute information for characters
in the code chart. But I'm thinking of something a little more generic,
associated with textedit/richedit type text editing areas (or associated
with general word processing programs).

The reason why such an approach is more extensible is that it is not
merely focussed on the nondisplayable character glyph issue, but rather
represents a general ability to "query" text, whether normally
displayable or not. I could query a black box notdef glyph to find
out what in the text caused its display; but I could just as well
query a properly displayed Telugu glyph, for example, to find out what
it was, as well.

This is comparable (although more point-oriented) to the concept of
giving people a source display for HTML, so they can figure out
what in the markup is causing rendering problems for their rich
text content.


> This proposal would provide a standardized approach that vendors could adopt
> to clarify missing character rendering and reduce support costs. By
> including this in the standard we could provide a cross vendor approach.
> This would provide a consistent solution.

In my opinion, the standard already provides a description of a cross-vendor
approach to the notdef glyph problem, with the advantage that it is
the de facto, widely adopted approach as well. As long as font vendors stay
away from making {p}'s and {q}'s their notdef glyphs, as I think we can
safely presume they will, and instead use variants on the themes of hollowed
or filled boxes, then the problem of *recognition* of the notdef glyphs
for what they are is a pretty marginal problem.

And as for how to provide users better diagnostics for figuring out the
content of undisplayable text, I suppose the standard could suggest some
implementation guidelines there, but this might be a better area to just
leave up to competing implementation practice until certain user interface
models catch on and get widespread acceptance.


This archive was generated by hypermail 2.1.2 : Mon Aug 26 2002 - 15:28:39 EDT