Re: Running out of code points, redux (was: Re: Feedback on the proposal...) from Richard Wordingham via Unicode on 2017-06-01 (Unicode Mail List Archive)

From: Richard Wordingham via Unicode <unicode_at_unicode.org>
Date: Thu, 1 Jun 2017 22:39:12 +0100

On Thu, 01 Jun 2017 12:54:45 -0700
Doug Ewell via Unicode <unicode_at_unicode.org> wrote:

> Richard Wordingham wrote:
>
> > even supporting 6-byte patterns just in case 20.1 bits eventually
> > turn out not to be enough,
>
> Oh, gosh, here we go with this.

You were implicitly invited to argue that there was no need to handle
5 and 6 byte invalid sequences.

> What will we do if 31 bits turn out not to be enough?

A compatible extension of UTF-16 to unbounded length has already been
designed. Prefix bytes 0xFF can be used to extend the length for UTF-8
by 8 bytes at a time. Extending UTF-32 is not beyond the wit of man,
and we know that UTF-16 could have been done better if the need had
been foreseen.

While it seems natural to hold a Unicode scalar value in a single
machine word of some length, this is not necessary, just highly
convenient.

In short, it won't be a big problem intrinsically. The UCD may get a
bit unwieldy, which may be a problem for small systems without Internet
access.

Richard.
Received on Thu Jun 01 2017 - 16:39:39 CDT

This archive was generated by hypermail 2.2.0 : Thu Jun 01 2017 - 16:39:39 CDT