Markus Kuhn scripsit:
> If I implement a UTF-8 -> UCS-2 converter, what shall I do with
> malformed UTF-8 sequences? ISO 10646-1 in section 2.3c and section R.7
> clearly requires that malformed UTF-8 sequences are indicated to the
> user. Is replacing any malformed UTF-8 sequence by 0xFFFD appropriate
> use of this character? After all, a malformed UTF-8 sequence is in a
> sense something outside the range of Unicode.
The Plan 9 folks decided no, that an unknown character is not the same as
an invalid encoding which does not represent any character.
They map the latter into U+0080, an unused control character.
-- John Cowan cowan@ccil.org e'osai ko sarji la lojban.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT