Re: UTF-8 ill-formed question

From: Otto Stolz <>
Date: Sun, 16 Dec 2012 13:13:27 +0100


am 2012-12-15 schrieb Philippe Verdy:
> But there's still a bug (or request for enhancement) for your Pocket
> converters :
> - For UTF-16 you correctly exclude the range U+D800..U+DFFF (surrogates)
> from the sets of convertible codepoints.
> - But you don't exclude this range in the case of your UTF-8 and UTF-32
> "magic encoders" which could forget this case. Of course your encoder would
> create distinct sequences for these code points, but they are not valid
> UTF-8 or valid UTF-32 encodings.

Only the UTF-16 variant is really *my* “magic pocket encoder” (MPE);
the author is nominated on every one of the three.

I would not demand more from those MPEs than converting
a valid UCS character to a valid, and equivalen, UTF
sequence – and to illustrate the underlying algorithm.
I guess, originally, they were meant as jokes – partially,
at least; I have used them as a didactic device, in my
beginner's lecture in Unicode.

Clearly, Mike Ayers made the point that the UTF-32 encoding
is nothing but a simple shortcut (in the terms of its two
predecessors). His one-row-only MPE expresses this quite
aptly, and any additional branch would spoil the impression.

The reason I excluded the surrogates from my UTF-8 MPE
was really that I needed additional space for the user’s
guide on the reverse side.

   Otto Stolz
Received on Sun Dec 16 2012 - 06:19:14 CST

This archive was generated by hypermail 2.2.0 : Sun Dec 16 2012 - 06:19:16 CST