Re: Running out of code points, redux (was: Re: Feedback on the proposal...)

From: Richard Wordingham via Unicode <>
Date: Thu, 1 Jun 2017 22:39:12 +0100

On Thu, 01 Jun 2017 12:54:45 -0700
Doug Ewell via Unicode <> wrote:

> Richard Wordingham wrote:
> > even supporting 6-byte patterns just in case 20.1 bits eventually
> > turn out not to be enough,
> Oh, gosh, here we go with this.

You were implicitly invited to argue that there was no need to handle
5 and 6 byte invalid sequences.

> What will we do if 31 bits turn out not to be enough?

A compatible extension of UTF-16 to unbounded length has already been
designed. Prefix bytes 0xFF can be used to extend the length for UTF-8
by 8 bytes at a time. Extending UTF-32 is not beyond the wit of man,
and we know that UTF-16 could have been done better if the need had
been foreseen.

While it seems natural to hold a Unicode scalar value in a single
machine word of some length, this is not necessary, just highly

In short, it won't be a big problem intrinsically. The UCD may get a
bit unwieldy, which may be a problem for small systems without Internet

Received on Thu Jun 01 2017 - 16:39:39 CDT

This archive was generated by hypermail 2.2.0 : Thu Jun 01 2017 - 16:39:39 CDT