[Unicode]  Frequently Asked Questions Home | Site Map | Search

Display of Unsupported Characters

Q: How should characters be displayed if the rendering system doesn't fully support them?

A: There are three main options, depending on the type of character involved. Some should not display at all (zero-width invisible characters); some should display as a visible (but blank) space; and some should be displayed with one or more generic glyphs, often referred to as "missing glyphs" or a ".notdef glyph". For more information on the missing glyphs, see the text under "Interpretable but Unrenderable Characters" in Section 5.3 Unknown and Missing Characters of The Unicode Standard.

Q: Which characters should be displayed as a visible but blank space?

A: This is the easy one: all the characters that have the White_Space property, also generically known as “whitespace characters”. This set includes SPACE, of course, but also such characters as the tab control character, NO-BREAK SPACE, LINE SEPARATOR, and so on. For the full list, see the White_Space values in PropList.txt.

Q: Which characters should be displayed with a missing glyph?

A: All regular graphic characters. PUA characters and most unassigned code points should also be displayed with a missing glyph, as there is no general way to tell what kind of character those might be intended for. Other characters which don't have either the White_Space property or the Default_Ignorable_Code_Point property should also display with a missing glyph.

Q: What about default ignorable code points, then?

A: That is the difficult case. The full list of those can be found under Default_Ignorable_Code_Point in DerivedCoreProperties.txt. A subset of those characters, most notably the ISO control characters, are best displayed with a missing glyph.

Q: So which default ignorable code points should be invisible, if not supported?

A: It is a mixed collection of characters, all of which are best rendered as completely invisible (and non advancing, i.e. “zero width”), if not explicitly supported in rendering. These include:

  • cursive joiners (U+200C ZWNJ, U+200D ZWJ)

  • bidirectional format controls (e.g. U+200E LEFT-TO-RIGHT MARK)

  • the soft hyphen (U+00AD SOFT HYPHEN)

  • word joiners (U+2060 WORD JOINER, also U+FEFF ZWNBSP)

  • the zero width space (U+200B ZERO WIDTH SPACE)

  • invisible math operators (e.g., U+2061 FUNCTION APPLICATION)

  • Jamo filler characters (e.g., U+115F HANGUL CHOSEONG FILLER)

  • variation selectors

Q: What about unsupported variation selector sequences?

A: The expected rendering behavior for the sequence of character plus a variation selector (C+VS) is as follows:

  • If C + VS is listed in StandardizedVariants.txt and supported by the rendering system, then display with the specified glyph.

  • Otherwise, display with the normal glyph for C (with no visible rendering for the VS).

Q: Are there any other special cases?

A. Yes, there are some characters that require special handling. The following format control characters, when fully supported, are displayed graphically as marks subtending or surrounding sequences of digits, or, in the case of the Syriac abbreviation mark, as a line over a sequence of letters:

0600  ARABIC NUMBER SIGN
0601  ARABIC SIGN SANAH
0602  ARABIC FOOTNOTE MARKER
0603  ARABIC SIGN SAFHA
06DD ARABIC END OF AYAH
070F  SYRIAC ABBREVIATION MARK

When not supported in rendering, the best choice is to display these with the missing glyph, to indicate the intended presence of a visible but undisplayable mark.

Q: How does the recommendation not to give any visible display for a subset of default ignorable code points affect font design?

A. Fonts are really best viewed in the context of a whole rendering system, since other parts of that system may handle various aspects of rendering. Where a font is being designed for a rendering system that does not handle invisible characters (such as variation selectors), then the best glyph for them — in the absence of other support — is a zero-width invisible glyph.

Q: Does that mean that a font can never display one of these characters?

A. No. Rendering systems may also support special modes such as “Display Hidden”, which are intended to reveal characters that would not otherwise display. Fonts can contain glyphs intended for visible display of default ignorable code points that would otherwise be rendered invisibly when not supported.

Q&A contributed by [MD] and [KW]