Re: illegal UTF-8 sequences and mbtowc()

From: John Cowan (cowan@ccil.org)
Date: Mon Nov 01 1999 - 11:48:32 EST


Markus Kuhn wrote:

> c) Let's extend UTF-16 to provide an encoding of malformed UTF-8 sequences.
> For instance, we could define in Plane 14 255 bytes that represent
> bytes which were part of an illegal UTF-8 sequence.

Better yet, let's use them to represent arbitrary octets! Then we can
have characters OCTET 00 through OCTET FF, and any binary stuff can be
embedded in Unicode (at a fourfold increase in size for either UTF-8 or
UTF-16).

-- 

John Cowan http://www.reutershealth.com jcowan@reutershealth.com Schlingt dreifach einen Kreis vom dies / Schliess eurer Aug vor heiliger Schau Den er genoss vom Honig-Tau / Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT