Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

From: Doug Ewell via Unicode <unicode_at_unicode.org>
Date: Tue, 30 May 2017 17:41:13 -0600

That's not at all the same as saying it was a valid sequence. That's saying decoders were allowed to be lenient with invalid sequences.
We're supposed to be comfortable with standards language here. Do we really not understand this distinction?

--Doug Ewell | Thornton, CO, US | ewellic.org
-------- Original message --------From: Karl Williamson <public_at_khwilliamson.com> Date: 5/30/17 16:32 (GMT-07:00) To: Doug Ewell <doug_at_ewellic.org>, Unicode Mailing List <unicode_at_unicode.org> Subject: Re: Feedback on the proposal to change U+FFFD generation when
  decoding ill-formed UTF-8
On 05/30/2017 02:30 PM, Doug Ewell via Unicode wrote:
> L2/17-168 says:
>
> "For UTF-8, recommend evaluating maximal subsequences based on the
> original structural definition of UTF-8, without ever restricting trail
> bytes to less than 80..BF. For example: <C0 AF> is a single maximal
> subsequence because C0 was originally a lead byte for two-byte
> sequences."
>
> When was it ever true that C0 was a valid lead byte? And what does that
> have to do with (not) restricting trail bytes?

Until TUS 3.1, it was legal for UTF-8 parsers to treat the sequence
  <C0 AF> as U+002F.
Received on Tue May 30 2017 - 18:42:10 CDT

This archive was generated by hypermail 2.2.0 : Tue May 30 2017 - 18:42:10 CDT