Re: Purpose of REPLACEMENT CHARACTER

From: John Cowan (cowan@locke.ccil.org)
Date: Sun Apr 11 1999 - 19:59:50 EDT


Markus Kuhn scripsit:

> If I implement a UTF-8 -> UCS-2 converter, what shall I do with
> malformed UTF-8 sequences? ISO 10646-1 in section 2.3c and section R.7
> clearly requires that malformed UTF-8 sequences are indicated to the
> user. Is replacing any malformed UTF-8 sequence by 0xFFFD appropriate
> use of this character? After all, a malformed UTF-8 sequence is in a
> sense something outside the range of Unicode.

The Plan 9 folks decided no, that an unknown character is not the same as
an invalid encoding which does not represent any character.
They map the latter into U+0080, an unused control character.

-- 
John Cowan					cowan@ccil.org
		e'osai ko sarji la lojban.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT