Question about Perl5 extended UTF-8 design from Karl Williamson on 2015-11-05 (Unicode Mail List Archive)

From: Karl Williamson <public_at_khwilliamson.com>
Date: Thu, 5 Nov 2015 08:57:16 -0700

Hi,

Several of us are wondering about the reason for reserving bits for the
extended UTF-8 in perl5. I'm asking you because you are the apparent
author of the commits that did this.

To refresh your memory, in perl5 UTF-8, a start byte of 0xFF causes the
length of the sequence of bytes that comprise a single character to be
13 bytes. This allows code points up to 2**72 - 1 to be represented.
If the length had been instead 12 bytes, code points up to 2**66 - 1
could be represented, which is enough to represent any code point
possible in a 64-bit word.

The comments indicate that these extra bits are "reserved". So we're
wondering what potential use you had thought of for these bits.

Thanks

Karl Williamson
Received on Thu Nov 05 2015 - 10:01:20 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 05 2015 - 10:01:21 CST