RE: UTF-8 ill-formed question

From: Doug Ewell <>
Date: Tue, 11 Dec 2012 14:15:43 -0700

Ian Clifton <ian dot clifton at chem dot ox dot ac dot uk> wrote:

>> Does anyone know why ill-form occurred on the UTF-8? besides it
>> doesn't follow > the pattern of UTF-8 byte-sequences, i just
>> wondering how or why?
> There’s a lot about the conditions for the well-formedness of UTF-8
> sequences in Chapter 3 of the Standard:
> [...]
> Even if these conditions hold, however, a UTF-8 sequence might still
> be ill-formed, Table 3-7 exhaustively lists all the cases.

But the bottom line is, there's nothing ill-formed about James' original
example. It's perfectly good UTF-8. The visual similarity between the
digits in U+4E8C and the first and last bytes in <E4 BA 8C> is mostly

Doug Ewell | Thornton, Colorado, USA | @DougEwell ­
Received on Tue Dec 11 2012 - 15:18:43 CST

This archive was generated by hypermail 2.2.0 : Tue Dec 11 2012 - 15:18:44 CST