Curiously, Unicode does already contain a character with the sole
intention of representing something missing in the font:
U+3013 GETA MARK
o substitute for ideograph not in font
The only problem with the GETA MARK is that it looks to me just like a
very bold equal sign and I would certainly not associate it with a
missing glyph. It's semantics specifically refers to missing ideographs
and not missing glyphs.
I guess that U+3013 might be a fine BDF DEFAULT_CHAR value for Unicode
fonts that are primarily intended to be used for displaying CJK
ideographs, but it is a somewhat strange glyph for western users. I'd
also prefer something with a box or a question mark.
So I think I will place something at position 0xFFFF as a default glyph
that will represent any character on the screen for which no glyph is
available in the font, and this glyph will look differently from the
If I implement a UTF-8 -> UCS-2 converter, what shall I do with
malformed UTF-8 sequences? ISO 10646-1 in section 2.3c and section R.7
clearly requires that malformed UTF-8 sequences are indicated to the
user. Is replacing any malformed UTF-8 sequence by 0xFFFD appropriate
use of this character? After all, a malformed UTF-8 sequence is in a
sense something outside the range of Unicode.
-- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT