Re: Corrigendum #9

From: Mark Davis ☕️ <>
Date: Tue, 3 Jun 2014 08:55:09 +0200

On Mon, Jun 2, 2014 at 10:32 PM, David Starner <> wrote:

> Why? It seems you're changing the rules
> ​...
This isn't "are changing", it is "has changed". The Corrigendum was issued
at the start of 2013, about 16 months ago; applicable to all relevant
earlier versions. It was the result of fairly extensive debate inside the
UTC; there hasn't been a single issue on this thread that wasn't considered
during the discussions there. And as far back as 2001, the UTC made it
clear that noncharacters *are* scalar values, and are to be converted by
UTF converters. Eg, see (by chance,
one day before 9/11).

> probably trigger serious bugs in some lamebrained utility.

There were already plenty of programs that passed the noncharacters
through; very few would filter them (some would delete them, which is
horrible for security). Thinking that a utility would never encounter them
in input text was a pipe-dream. If a utility or library is so fragile that
it *breaks* on input of any valid UTF sequence, then it *is* a "lamebrained"
utility. A good unit test for any production chain would be to check there
is no crash on any input scalar value (and for that matter, any ill-formed
UTF text).

Unicode mailing list
Received on Tue Jun 03 2014 - 01:57:23 CDT

This archive was generated by hypermail 2.2.0 : Tue Jun 03 2014 - 01:57:23 CDT