From: Markus Kuhn (
Date: Sun Apr 11 1999 - 04:41:44 EDT

Curiously, Unicode does already contain a character with the sole
intention of representing something missing in the font:

  U+3013 GETA MARK
           o substitute for ideograph not in font

The only problem with the GETA MARK is that it looks to me just like a
very bold equal sign and I would certainly not associate it with a
missing glyph. It's semantics specifically refers to missing ideographs
and not missing glyphs.

I guess that U+3013 might be a fine BDF DEFAULT_CHAR value for Unicode
fonts that are primarily intended to be used for displaying CJK
ideographs, but it is a somewhat strange glyph for western users. I'd
also prefer something with a box or a question mark.

So I think I will place something at position 0xFFFF as a default glyph
that will represent any character on the screen for which no glyph is
available in the font, and this glyph will look differently from the

Other question:

If I implement a UTF-8 -> UCS-2 converter, what shall I do with
malformed UTF-8 sequences? ISO 10646-1 in section 2.3c and section R.7
clearly requires that malformed UTF-8 sequences are indicated to the
user. Is replacing any malformed UTF-8 sequence by 0xFFFD appropriate
use of this character? After all, a malformed UTF-8 sequence is in a
sense something outside the range of Unicode.


Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at,  WWW: <>

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT