Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

From: Richard Wordingham via Unicode <>
Date: Tue, 16 May 2017 09:00:13 +0100

On Tue, 16 May 2017 10:01:03 +0300
Henri Sivonen via Unicode <> wrote:

> Even so, I think even changing a recommendation of "best practice"
> needs way better rationale than "feels right" or "ICU already does it"
> when a) major browsers (which operate in the most prominent
> environment of broken and hostile UTF-8) agree with the
> currently-recommended best practice and b) the currently-recommended
> best practice makes more sense for implementations where "UTF-8
> decoding" is actually mere "UTF-8 validation".

There was originally an attempt to prescribe rather than to recommend
the interpretation of ill-formed 8-bit Unicode strings. It may even
briefly have been an issued prescription, until common sense prevailed.
I do remember a sinking feeling when I thought I would have to change
my own handling of bogus UTF-8, only to be relieved later when it
became mere best practice. However, it is not uncommon for coding
standards to prescribe 'best practice'.

Received on Tue May 16 2017 - 03:00:38 CDT

This archive was generated by hypermail 2.2.0 : Tue May 16 2017 - 03:00:38 CDT