Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

From: Richard Wordingham via Unicode <unicode_at_unicode.org>
Date: Fri, 2 Jun 2017 02:45:29 +0100

On Thu, 1 Jun 2017 17:10:54 -0700
Ken Whistler via Unicode <unicode_at_unicode.org> wrote:

> Well, working from the *current* specification:
>
> FC 80 80 80 80 80
> and
> FF FF FF FF FF FF
>
> are equal trash, uninterpretable as *anything* in UTF-8.
>
> By definition D39b, either sequence of bytes, if encountered by an
> conformant UTF-8 conversion process, would be interpreted as a
> sequence of 6 maximal subparts of an ill-formed subsequence.

There is a very good argument that 0xFC and 0xFF are not code units
(D77) - they are not used in the representation of any Unicode scalar
values. By that argument, you have 5 maximal subparts and seven
garbage bytes.

Richard.
Received on Thu Jun 01 2017 - 20:45:53 CDT

This archive was generated by hypermail 2.2.0 : Thu Jun 01 2017 - 20:45:53 CDT