Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

From: Richard Wordingham via Unicode <>
Date: Mon, 5 Jun 2017 13:37:16 +0100

On Mon, 5 Jun 2017 13:08:06 +0900
"Martin J. Dürst via Unicode" <> wrote:

> On 2017/06/02 04:54, Doug Ewell via Unicode wrote:
> > Richard Wordingham wrote:
> >
> >> even supporting 6-byte patterns just in case 20.1 bits eventually
> >> turn out not to be enough,
> Sorry to be late with this, but if 20.1 bits turn out to not be
> enough, what about 21 bits?
> That would still limit UTF-8 to four bytes, but would almost double
> the code space. Assuming (conservatively) that it will take about a
> century to fill up all 17 (well, actually 15, because two are
> private) planes, this would give us another century.

It all depends on how the lead byte is parsed. With a block-if
construct ignorant of the original design or a look-up table, it may be
simplest to treat F5 onwards as out and out errors and not expect any
trailing bytes. Code handling attempts at 6-byte code points
was the most complex case. Of course, one **might** want to handle a
list of mostly small positive integers, at which point the old UTF-8
design might be useful.

Received on Mon Jun 05 2017 - 07:37:48 CDT

This archive was generated by hypermail 2.2.0 : Mon Jun 05 2017 - 07:37:49 CDT