Question about Perl5 extended UTF-8 design

From: Karl Williamson <>
Date: Thu, 5 Nov 2015 08:57:16 -0700


Several of us are wondering about the reason for reserving bits for the
extended UTF-8 in perl5. I'm asking you because you are the apparent
author of the commits that did this.

To refresh your memory, in perl5 UTF-8, a start byte of 0xFF causes the
length of the sequence of bytes that comprise a single character to be
13 bytes. This allows code points up to 2**72 - 1 to be represented.
If the length had been instead 12 bytes, code points up to 2**66 - 1
could be represented, which is enough to represent any code point
possible in a 64-bit word.

The comments indicate that these extra bits are "reserved". So we're
wondering what potential use you had thought of for these bits.


Karl Williamson
