Henning Brunzel wrote on 1999-10-30 10:39 UTC:
> Markus Kuhn wrote:
> > c) Let's extend UTF-16 to provide an encoding of malformed UTF-8 sequences.
> > For instance, we could define in Plane 14 255 bytes that represent
> > bytes which were part of an illegal UTF-8 sequence. This would allow
> > loss-less UTF-8 -> UTF-16 -> UTF-8 conversion even for arbitrary random
> > byte-strings that do not look anything like valid UTF-8.
> IIRC the starting point for this was to get only one code for every
> malformed sequence instead of every byte. This proposal would actually
> get two codes per byte. Am I missing something?
Think of a surrogate pair as a single code in UTF-16.
Markus
-- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT