Re: "Missing character" glyph

From: Mark Davis (mark.davis@jtcsv.com)
Date: Thu Aug 01 2002 - 13:05:28 EDT


The standard already makes a recommendation for the display of characters
with missing glyphs. See page 108, Section 5.3. The charts index contains
such images: see http://www.unicode.org/charts/

I don't believe there is any need to add characters for use as missing
glyphs.

Mark
__________________________________
http://www.macchiato.com
► “Eppur si muove” ◄

----- Original Message -----
From: "Martin Kochanski" <unicode@cardbox.net>
To: "Otto Stolz" <Otto.Stolz@uni-konstanz.de>
Cc: <unicode@unicode.org>
Sent: Thursday, August 01, 2002 03:11
Subject: Re: "Missing character" glyph

> The responses from this mailing list have made me re-think the problem and
propose a possible solution.
>
> The point about missing characters (more accurately, "unrendered
characters") is that different fonts (more accurately, different
combinations of font plus rendering system) display them in different ways.
I have seen hollow squares and rectangles; filled rectangles; small
diamond-shaped bullets; and question marks.
>
> Unrendered characters will become more noticeable as Unicode becomes more
widespread and computing increasingly transcends linguistic and script
boundaries. On the whole, with existing 7-bit and 8-bit national standards,
a user in any particular country will find that any character that can be
encoded can also be displayed, so that the distinction between encodable and
displayable characters is one that simply does not need to occur to an
ordinary user. But someone using Unicode to view (for example) Web pages
from another country may find that the fonts on his computer are missing
some vital characters, which the computer then renders in an arbitrary way
(as hollow squares, etc); leading to puzzlement and confusion. Eventually,
as "large" Unicode fonts become more widely installed, the problem will
diminish; but it will never entirely go away unless the Unicode standard
stops evolving.
>
> There is a need to talk about what an unrendered character looks like when
explaining the concept to a user and explaining that special actions may
need to be taken (for instance, changing fonts or downloading a new version
of a font).
>
> Printed manuals can handle unrendered characters quite easily. The manual
can use one arbitrarily chosen appearance (such as U+25AF or U+2337) for
unrendered characters, with a note (on first occurrence) that the screen
appearance of unrendered characters may vary - screenshots can be given as
examples.
>
> On-screen text does, however, present problems: especially Web pages. The
writer of the text has no control over the font that will be used to display
it [in some cases he may be able to specify or request the *name* of the
font to be used, but this is no guarantee that the font of that name will
contain all the needed characters or that it will even be installed on the
user's computer]. There is a need to be able to say in a web page: "If some
of the text on this page looks like this: ????? then you should install font
XXXX / download a new font from [link]" - where ????? looks *exactly* how an
unrendered character would look in the font that the web page is being
displayed with.
>
> No presently defined Unicode character can be used to represent <?> in the
above message. A hollow rectangle such as U+25AF or U+2337 will only
resemble the screen appearance of unrendered characters if the font being
used happens to use that particular sort of hollow rectangle to represent
unrendered characters: in a font that uses small diamonds, representing <?>
as a hollow square would be confusing counter-productive.
>
> For the same reason, a bitmap cannot be used: a bitmap's appearance will
not vary automatically as the font used to display the message changes.
>
> Rewriting the message to say "If a lot of the text on this page looks like
hollow squares or small solid rectangles or little diamonds or anything else
strange, then you should install font XXXX / download a new font from
[link]" is not a practical solution because it adds complexity, obscurity,
and verbosity; adds a level of abstraction that it is neither necessary nor
easy for the user to follow; and uses up valuable screen space.
>
> It follows that there is a need for a defined Unicode character that
represents the appearance of an unrendered character in the font in which it
is displayed.
>
> I am wondering whether it would be worth submitting a proposal for such a
character. For example:
> U+024F UNRENDERED CHARACTER
>
> While the addition of characters to Unicode is something to be done only
as a last resort, I believe that there is, in this case, no alternative.
>
> Such a character proposal would have the advantage that every existing
Unicode font *already* implements it correctly - by definition [but see the
note below about section 5.3 of the Unicode standard]. Thus no changes will
be needed to fonts or to rendering engines.
>
> To look at it another way, virtually the only action that the Unicode
Consortium needs to take to define UNRENDERED CHARACTER is to promise never
to define a character at that code point.
>
> UNRENDERED CHARACTER has to be part of the BMP for backward compatibility:
it should be renderable as a single glyph, not as a pair of glyphs, even on
old systems that do not understand surrogates. The proposed positioning is
intended to persuade older systems that this character should be rendered
conventionally, like a Latin letter.
>
> The nearest possible alternatives are:
>
> U+FFFE - on at least some Windows systems, this is displayed correctly
(ie. identically to characters that are missing from the current font); but
in the Unicode standard it has the explicit semantics of not being a
character at all, and so ought not to be intentionally used as a character
(a rendering engine would be within its rights to suppress it altogether;
some application programs might report errors or even become confused about
byte ordering).
>
> U+FFFD - on at least some Windows systems, this is displayed correctly
(ie. identically to characters that are missing from the current font); but
in the Unicode standard it has the explicit semantics of being a replacement
for a character *unrepresentable in Unicode*. A character unrepresentable in
Unicode is not the same as a Unicode character that happens not to have a
representation in the current font. It is possible that a particular font
may have distinctive visual representations of U+FFFC and U+FFFD that are
distinct from the way that it draws unrendered characters.
>
> Otto Stolz suggested U+03A2, which would be equally valid. However, U+03A2
is quite obviously the code for GREEK CAPITAL LETTER FINAL SIGMA. For O.S.,
this is a reason for using the code (because there is, in fact, no such
letter, so the code can be used); for me, this is a strong reason for *not*
using the code, because if it **ever** became necessary to encode GREEK
CAPITAL LETTER FINAL SIGMA then no character other than U+03A2 would be
acceptable, whereas U+024F has no inherent semantics at all.
>
> Section 5.3 of the Unicode standard makes a distinction between unassigned
and unrenderable characters. Systems that make use of this distinction are
an exception to the statement I made earlier that "every existing Unicode
font already renders UNRENDERED CHARACTER correctly". Nevertheless, the
rendering of UNRENDERED CHARACTER as "unassigned" rather than "unrenderable"
is unlikely to cause much confusion.
>
> One other exception would be a pathologically helpful font/engine that
represents each unrendered character as a unique glyph (for example, a
miniature of the character's hexadecimal value). This, again, would not be a
problem: the user will instantly recognize "miniature 024F" as being
different from ordinary characters and in the same class as the "miniature
021D" glyphs that disfigure the page.
>
> Would it be worth submitting a proposal for UNRENDERED CHARACTER? As I
said, it *is* adequately implemented already: the only purpose for wanting
it defined in the standard is to prevent the implementation from being
suddenly broken in the future.
>
> - Martin Kochanski.
>
>
>



This archive was generated by hypermail 2.1.2 : Thu Aug 01 2002 - 11:20:38 EDT