Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

From: Alastair Houghton via Unicode <>
Date: Tue, 23 May 2017 10:17:06 +0100

On 23 May 2017, at 07:10, Jonathan Coxhead via Unicode <> wrote:
> On 18/05/2017 1:58 am, Alastair Houghton via Unicode wrote:
>> On 18 May 2017, at 07:18, Henri Sivonen via Unicode <>
>> wrote:
>>> the decision complicates U+FFFD generation when validating UTF-8 by state machine.
>> It *really* doesn’t. Even if you’re hell bent on using a pure state machine approach, you need to add maybe two additional error states (two-trailing-bytes-to-eat-then-fffd and one-trailing-byte-to-eat-then-fffd) on top of the states you already have. The implementation complexity argument is a *total* red herring.
> Heh. A state machine with N+2 states is, a fortiori, more complex than one with N states. So I think your argument is self-contradictory.

You’re being overly pedantic (and in this case, actually, the cyclomatic complexity of the state machine wouldn’t increase). In any case, Henri is complaining that it’s too difficult to implement; it isn’t. You need two extra states, both of which are trivial.

The point I was making was that this is not a strong argument against the proposed change, *even if* we were treating it as a requirement, which it isn’t.

Kind regards,


Received on Tue May 23 2017 - 04:17:39 CDT

This archive was generated by hypermail 2.2.0 : Tue May 23 2017 - 04:17:39 CDT