RE: Revised proposal for "Missing character" glyph

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Mon Aug 19 2002 - 23:33:38 EDT


Ken,

This is an alternate to representing bad glyphs with a missing glyph
character. People can implement either.

> -----Original Message-----
> From: Kenneth Whistler [mailto:kenw@sybase.com]
> Sent: Friday, August 16, 2002 2:28 PM
> To: cbrown@xnetinc.com
> Cc: unicode@unicode.org; kenw@sybase.com
> Subject: Re: Revised proposal for "Missing character" glyph
>
>
> > Proposed unknown and missing character representation. This would be an
> > alternate to method currently described in 5.3.
> >
> > The missing or unknown character would be represented as a series of
> > vertical hex digit pairs for each byte of the character.
>
> The problem I have with this is that is seems to be an overengineered
> approach that conflates two issues:
>
> a. What does a font do when requested to display a character
> (or sequence) for which it has no glyph.
>
> b. What does a user do to diagnose text content that may be
> causing a rendering failure.
>
> For the first problem, we already have a widespread approach that
> seems adequate. And other correspondents on this topic have pointed
> out that the particular approach of displaying up hex numbers for
> characters may pose technical difficulties for at least some font
> technologies.
>

Because proportional fonts require font metrics processing the process must
be able to determine if a character can not be rendered. The logic can be
changed to use a special font with 257 glyphs to produce these characters.
Thus it should be possible to incorporate this into the operating system
code rather than each application. It would be best to put it in Open Type
or equivalent code but not all systems have this type of code. ICU's layout
code would also be a good place.

Systems limited to monospaced fonts will have problems implementing this.

>
> >
> > This representation would be recognized by untrained people as
> unrenderable
> > data or garbage. So it would serve the same function as a missing glyph
> > character except that it would be different from normal glyphs
> so that they
> > would know that something was wrong and the text did not just
> happen to have
> > funny characters.
>
> I don't see any particular problem in training people to recognize when
> they are seeing their fonts' notdef glyphs. The whole concept of "seeing
> little boxes where the characters should be" is not hard to explain to
> people -- even to people who otherwise have difficulty with a lot of
> computer abstractions.
>
> Things will be better-behaved when applications finally get past the
> related but worse problem of screwing up the character encodings --
> which results in the more typical misdisplay: lots of recognizable
> glyphs, but randomly arranged into nonsensical junk. (Ah, yes, that
> must be another piece of Korean spam mail in my mail tray.)
>

Unicode text will do more to fix character encoding problems. Then the
problem will be either truly bad characters or font problems. Many systems
have difficulties handling sets of fonts each covering a porting of the
character range. This would provide an indication of which scripts were
missing. Yes you could use the suggested script id glyphs but that would
require special processing that would be as difficult as this to implement.

> >
> > It would aid people in finding the problem and for people with
> Unicode books
> > the text would be decipherable. If the information was truly
> critical they
> > could have the text deciphered.
>
> Rather than trying to engineer a questionable solution into the fonts,
> I'd like to step back and ask what would better serve the user
> in such circumstances.
>
> And an approach which strikes me as a much more useful and extensible
> way to deal with this would be the concept of a "What's This?"
> text accessory. Essentially a small tool that a user could select
> a piece of text with (think of it like a little magnifying glass,
> if you will), which will then pop up the contents selected, deconstructed
> into its character sequence explicitly. Limited versions of such things
> exist already -- such as the tooltip-like popup windows for Asmus'
> Unibook program, which give attribute information for characters
> in the code chart. But I'm thinking of something a little more generic,
> associated with textedit/richedit type text editing areas (or associated
> with general word processing programs).
>
> The reason why such an approach is more extensible is that it is not
> merely focussed on the nondisplayable character glyph issue, but rather
> represents a general ability to "query" text, whether normally
> displayable or not. I could query a black box notdef glyph to find
> out what in the text caused its display; but I could just as well
> query a properly displayed Telugu glyph, for example, to find out what
> it was, as well.
>
> This is comparable (although more point-oriented) to the concept of
> giving people a source display for HTML, so they can figure out
> what in the markup is causing rendering problems for their rich
> text content.
>

Text query will requite that each application be modified to support this
feature and it will require special user training. This might be nice but
except for very special applications I think that it is impractable.

>
> > This proposal would provide a standardized approach that
> vendors could adopt
> > to clarify missing character rendering and reduce support costs. By
> > including this in the standard we could provide a cross vendor approach.
> > This would provide a consistent solution.
>
> In my opinion, the standard already provides a description of a
> cross-vendor
> approach to the notdef glyph problem, with the advantage that it is
> the de facto, widely adopted approach as well. As long as font
> vendors stay
> away from making {p}'s and {q}'s their notdef glyphs, as I think we can
> safely presume they will, and instead use variants on the themes
> of hollowed
> or filled boxes, then the problem of *recognition* of the notdef glyphs
> for what they are is a pretty marginal problem.
>
> And as for how to provide users better diagnostics for figuring out the
> content of undisplayable text, I suppose the standard could suggest some
> implementation guidelines there, but this might be a better area to just
> leave up to competing implementation practice until certain user interface
> models catch on and get widespread acceptance.
>

The problem is how do you implement them and train users. It should be
something that can be made part of the OS or common services code. It the
user has to do more that a screen print then it will be difficult to use.

Carl



This archive was generated by hypermail 2.1.2 : Mon Aug 19 2002 - 21:51:33 EDT