Re: Purpose of REPLACEMENT CHARACTER

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Tue Apr 13 1999 - 04:22:30 EDT

Next message: Christian Wittern: "RE: [Proposal] Extended UTF-16 by using Plane 14"
Previous message: Geoffrey Waigh: "Re: [Proposal] Extended UTF-16 by using Plane 14"
Maybe in reply to: Markus Kuhn: "Purpose of REPLACEMENT CHARACTER"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

John Cowan wrote on 1999-04-11 23:54 UTC:
> > If I implement a UTF-8 -> UCS-2 converter, what shall I do with
> > malformed UTF-8 sequences? ISO 10646-1 in section 2.3c and section R.7
> > clearly requires that malformed UTF-8 sequences are indicated to the
> > user. Is replacing any malformed UTF-8 sequence by 0xFFFD appropriate
> > use of this character? After all, a malformed UTF-8 sequence is in a
> > sense something outside the range of Unicode.
>
> The Plan 9 folks decided no, that an unknown character is not the same as
> an invalid encoding which does not represent any character.
> They map the latter into U+0080, an unused control character.

U+0080 seems a very random pick to me and will show up on Windows as the
euro sign. If they wanted to use an 8-bit control character, then a
better choice would have been U+001A (ASCII SUB), because according to
ISO 6429 and ECMA 35 <ftp://ftp.ecma.ch/ECMA-ST/E035-PDF.PDF>, section
8.3.148:

  "SUB is used in the place of a character that has been found
  to be invalid or in error. SUB is intended to be introduced by
  automatic means."

It is just not clear to me, whether I should introduce a new glyph to be
associated with a C0 control character.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

Next message: Christian Wittern: "RE: [Proposal] Extended UTF-16 by using Plane 14"
Previous message: Geoffrey Waigh: "Re: [Proposal] Extended UTF-16 by using Plane 14"
Maybe in reply to: Markus Kuhn: "Purpose of REPLACEMENT CHARACTER"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT