Re: Purpose of REPLACEMENT CHARACTER

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Sun Apr 11 1999 - 04:41:44 EDT


Curiously, Unicode does already contain a character with the sole
intention of representing something missing in the font:

  U+3013 GETA MARK
           o substitute for ideograph not in font

The only problem with the GETA MARK is that it looks to me just like a
very bold equal sign and I would certainly not associate it with a
missing glyph. It's semantics specifically refers to missing ideographs
and not missing glyphs.

I guess that U+3013 might be a fine BDF DEFAULT_CHAR value for Unicode
fonts that are primarily intended to be used for displaying CJK
ideographs, but it is a somewhat strange glyph for western users. I'd
also prefer something with a box or a question mark.

So I think I will place something at position 0xFFFF as a default glyph
that will represent any character on the screen for which no glyph is
available in the font, and this glyph will look differently from the
REPLACEMENT CHARACTER.

Other question:

If I implement a UTF-8 -> UCS-2 converter, what shall I do with
malformed UTF-8 sequences? ISO 10646-1 in section 2.3c and section R.7
clearly requires that malformed UTF-8 sequences are indicated to the
user. Is replacing any malformed UTF-8 sequence by 0xFFFD appropriate
use of this character? After all, a malformed UTF-8 sequence is in a
sense something outside the range of Unicode.

ftp://ftp.informatik.uni-erlangen.de/pub/doc/ISO/charsets/ISO-10646-UTF-8.html

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT