Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8 from Asmus Freytag \(c\) via Unicode on 2017-05-23 (Unicode Mail List Archive)

From: Asmus Freytag \(c\) via Unicode <unicode_at_unicode.org>
Date: Tue, 23 May 2017 11:20:23 -0700

On 5/23/2017 10:45 AM, Markus Scherer wrote:
> On Tue, May 23, 2017 at 7:05 AM, Asmus Freytag via Unicode
> <unicode_at_unicode.org <mailto:unicode_at_unicode.org>> wrote:
>
> So, if the proposal for Unicode really was more of a "feels right"
> and not a "deviate at your peril" situation (or necessary escape
> hatch), then we are better off not making a RECOMMEDATION that
> goes against collective practice.
>
>
> I think the standard is quite clear about this:
>
> Although a UTF-8 conversion process is required to never consume
> well-formed subsequences as part of its error handling for
> ill-formed subsequences, such a process is not otherwise
> constrained in how it deals with any ill-formed subsequence
> itself. An ill-formed subsequence consisting of more than one code
> unit could be treated as a single error or as multiple errors.
>
>
And why add a recommendation that changes that from completely up to the
implementation (or groups of implementations) to something where one way
of doing it now has to justify itself?

If the thread has made one thing clear is that there's no consensus in
the wider community that one approach is obviously better. When it comes
to ill-formed sequences, all bets are off. Simple as that.

Adding a "recommendation" this late in the game is just bad standards
policy.

A./
Received on Tue May 23 2017 - 13:20:49 CDT

This archive was generated by hypermail 2.2.0 : Tue May 23 2017 - 13:20:49 CDT