Re: "Missing character" glyph

From: Martin Kochanski (unicode@cardbox.net)
Date: Thu Aug 01 2002 - 14:14:29 EDT


The section that you mention makes no provision at all for the intentional display of the glyphs that it depicts.

At 10:05 01/08/02 -0700, Mark Davis wrote:
>The standard already makes a recommendation for the display of characters
>with missing glyphs. See page 108, Section 5.3. The charts index contains
>such images: see http://www.unicode.org/charts/
>
>I don't believe there is any need to add characters for use as missing
>glyphs.
>
>Mark
>__________________________________
>http://www.macchiato.com
>► “Eppur si muove” ◄
>
>----- Original Message -----
>From: "Martin Kochanski" <unicode@cardbox.net>
>To: "Otto Stolz" <Otto.Stolz@uni-konstanz.de>
>Cc: <unicode@unicode.org>
>Sent: Thursday, August 01, 2002 03:11
>Subject: Re: "Missing character" glyph
>
>
>> The responses from this mailing list have made me re-think the problem and
>propose a possible solution.
>>
>> The point about missing characters (more accurately, "unrendered
>characters") is that different fonts (more accurately, different
>combinations of font plus rendering system) display them in different ways.
>I have seen hollow squares and rectangles; filled rectangles; small
>diamond-shaped bullets; and question marks.
>>
>> Unrendered characters will become more noticeable as Unicode becomes more
>widespread and computing increasingly transcends linguistic and script
>boundaries. On the whole, with existing 7-bit and 8-bit national standards,
>a user in any particular country will find that any character that can be
>encoded can also be displayed, so that the distinction between encodable and
>displayable characters is one that simply does not need to occur to an
>ordinary user. But someone using Unicode to view (for example) Web pages
>from another country may find that the fonts on his computer are missing
>some vital characters, which the computer then renders in an arbitrary way
>(as hollow squares, etc); leading to puzzlement and confusion. Eventually,
>as "large" Unicode fonts become more widely installed, the problem will
>diminish; but it will never entirely go away unless the Unicode standard
>stops evolving.
>>
>> There is a need to talk about what an unrendered character looks like when
>explaining the concept to a user and explaining that special actions may
>need to be taken (for instance, changing fonts or downloading a new version
>of a font).
>>
>> Printed manuals can handle unrendered characters quite easily. The manual
>can use one arbitrarily chosen appearance (such as U+25AF or U+2337) for
>unrendered characters, with a note (on first occurrence) that the screen
>appearance of unrendered characters may vary - screenshots can be given as
>examples.
>>
>> On-screen text does, however, present problems: especially Web pages. The
>writer of the text has no control over the font that will be used to display
>it [in some cases he may be able to specify or request the *name* of the
>font to be used, but this is no guarantee that the font of that name will
>contain all the needed characters or that it will even be installed on the
>user's computer]. There is a need to be able to say in a web page: "If some
>of the text on this page looks like this: ????? then you should install font
>XXXX / download a new font from [link]" - where ????? looks *exactly* how an
>unrendered character would look in the font that the web page is being
>displayed with.
>>
>> No presently defined Unicode character can be used to represent <?> in the
>above message. A hollow rectangle such as U+25AF or U+2337 will only
>resemble the screen appearance of unrendered characters if the font being
>used happens to use that particular sort of hollow rectangle to represent
>unrendered characters: in a font that uses small diamonds, representing <?>
>as a hollow square would be confusing counter-productive.
>>
>> For the same reason, a bitmap cannot be used: a bitmap's appearance will
>not vary automatically as the font used to display the message changes.
>>
>> Rewriting the message to say "If a lot of the text on this page looks like
>hollow squares or small solid rectangles or little diamonds or anything else
>strange, then you should install font XXXX / download a new font from
>[link]" is not a practical solution because it adds complexity, obscurity,
>and verbosity; adds a level of abstraction that it is neither necessary nor
>easy for the user to follow; and uses up valuable screen space.
>>
>> It follows that there is a need for a defined Unicode character that
>represents the appearance of an unrendered character in the font in which it
>is displayed.
>>
>> I am wondering whether it would be worth submitting a proposal for such a
>character. For example:
>> U+024F UNRENDERED CHARACTER
>>
>> While the addition of characters to Unicode is something to be done only
>as a last resort, I believe that there is, in this case, no alternative.
>>
>> Such a character proposal would have the advantage that every existing
>Unicode font *already* implements it correctly - by definition [but see the
>note below about section 5.3 of the Unicode standard]. Thus no changes will
>be needed to fonts or to rendering engines.
>>
>> To look at it another way, virtually the only action that the Unicode
>Consortium needs to take to define UNRENDERED CHARACTER is to promise never
>to define a character at that code point.
>>
>> UNRENDERED CHARACTER has to be part of the BMP for backward compatibility:
>it should be renderable as a single glyph, not as a pair of glyphs, even on
>old systems that do not understand surrogates. The proposed positioning is
>intended to persuade older systems that this character should be rendered
>conventionally, like a Latin letter.
>>
>> The nearest possible alternatives are:
>>
>> U+FFFE - on at least some Windows systems, this is displayed correctly
>(ie. identically to characters that are missing from the current font); but
>in the Unicode standard it has the explicit semantics of not being a
>character at all, and so ought not to be intentionally used as a character
>(a rendering engine would be within its rights to suppress it altogether;
>some application programs might report errors or even become confused about
>byte ordering).
>>
>> U+FFFD - on at least some Windows systems, this is displayed correctly
>(ie. identically to characters that are missing from the current font); but
>in the Unicode standard it has the explicit semantics of being a replacement
>for a character *unrepresentable in Unicode*. A character unrepresentable in
>Unicode is not the same as a Unicode character that happens not to have a
>representation in the current font. It is possible that a particular font
>may have distinctive visual representations of U+FFFC and U+FFFD that are
>distinct from the way that it draws unrendered characters.
>>
>> Otto Stolz suggested U+03A2, which would be equally valid. However, U+03A2
>is quite obviously the code for GREEK CAPITAL LETTER FINAL SIGMA. For O.S.,
>this is a reason for using the code (because there is, in fact, no such
>letter, so the code can be used); for me, this is a strong reason for *not*
>using the code, because if it **ever** became necessary to encode GREEK
>CAPITAL LETTER FINAL SIGMA then no character other than U+03A2 would be
>acceptable, whereas U+024F has no inherent semantics at all.
>>
>> Section 5.3 of the Unicode standard makes a distinction between unassigned
>and unrenderable characters. Systems that make use of this distinction are
>an exception to the statement I made earlier that "every existing Unicode
>font already renders UNRENDERED CHARACTER correctly". Nevertheless, the
>rendering of UNRENDERED CHARACTER as "unassigned" rather than "unrenderable"
>is unlikely to cause much confusion.
>>
>> One other exception would be a pathologically helpful font/engine that
>represents each unrendered character as a unique glyph (for example, a
>miniature of the character's hexadecimal value). This, again, would not be a
>problem: the user will instantly recognize "miniature 024F" as being
>different from ordinary characters and in the same class as the "miniature
>021D" glyphs that disfigure the page.
>>
>> Would it be worth submitting a proposal for UNRENDERED CHARACTER? As I
>said, it *is* adequately implemented already: the only purpose for wanting
>it defined in the standard is to prevent the implementation from being
>suddenly broken in the future.
>>
>> - Martin Kochanski.
>>
>>
>>
>
>
>



This archive was generated by hypermail 2.1.2 : Thu Aug 01 2002 - 12:20:22 EDT