RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8 from Shawn Steele via Unicode on 2017-05-30 (Unicode Mail List Archive)

From: Shawn Steele via Unicode <unicode_at_unicode.org>
Date: Tue, 30 May 2017 17:11:40 +0000

> Which is to completely reverse the current recommendation in Unicode 9.0. While I agree that this might help you fending off a bug report, it would create chances for bug reports for Ruby, Python3, many if not all Web browsers,...

& Windows & .Net

Changing the behavior of the Windows / .Net SDK is a non-starter.

> Essentially, "overlong" is a word like "dragon" or "ghost": Everybody knows what it means, but everybody knows they don't exist.

Yes, this is trying to improve the language for a scenario that CANNOT HAPPEN. We're trying to optimize a case for data that implementations should never encounter. It is sort of exactly like optimizing for the case where your data input is actually a dragon and not UTF-8 text.

Since it is illegal, then the "at least 1 FFFD but as many as you want to emit (or just fail)" is fine.

-Shawn
Received on Tue May 30 2017 - 12:12:07 CDT

This archive was generated by hypermail 2.2.0 : Tue May 30 2017 - 12:12:07 CDT