Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

From: Richard Wordingham via Unicode <unicode_at_unicode.org>
Date: Thu, 18 May 2017 06:01:49 +0100

On Thu, 18 May 2017 02:04:55 +0200
Philippe Verdy via Unicode <unicode_at_unicode.org> wrote:

> I find intriguating that the update intends to enforce the decoding
> of the **shortest** sequences, but now wants to treat **maximal
> sequences** as a single unit with arbitrary length. UTF-8 was
> designed to work only with some state machines that would NEVER need
> to parse more than 4 bytes.

If you look at the sample code in
http://www.unicode.org/versions/Unicode2.0.0/appA.pdf, you'll see that
it's working with 6-byte sequences. It's the Unicode, as opposed to
ISO 10646, version that has always been restricted to 4 bytes.

Richard.
Received on Thu May 18 2017 - 00:02:26 CDT

This archive was generated by hypermail 2.2.0 : Thu May 18 2017 - 00:02:26 CDT