Re: Custom characters (was: Re: Private Use Area in Use)

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Thu, 4 Jun 2015 20:36:26 +0100

On Thu, 04 Jun 2015 14:39:27 +0000
David Starner <prosfilaes_at_gmail.com> wrote:

> On Thu, Jun 4, 2015 at 6:09 AM John <idou747_at_gmail.com> wrote:
>
> > Mostly just a matter of upgrading the character size.
>
>
> Which totally blows any concern with text size out of the water.
> Using 30 bytes to define certain very rare characters and 1 byte to
> define ASCII is way better then using 8 bytes to define all
> characters.

The character size can be increased to 64 bits in such a way that no
new surrogates are required, current UTF-8 text remains UTF-8, current
UTF-16 text remains UTF-16 and current UTF-32 remains UTF-32, the
extended UTF-8 still has 8-bit code units, the extended UTF-16 still has
16-bit units, and the extended UTF-32 still has 32-bit code units. In
fact, the character size can be made unbounded.

The trick is to extend UTF-8 indefinitely, and then for UTF-16 and
UTF-32 repeat the idea of the UTF-8 scheme using sequences of two or
more low surrogates (or two or more high surrogates - one must chose)
much as UTF-8 uses bytes. Tom Bishop publicised the idea.

Richard.
Received on Thu Jun 04 2015 - 14:37:27 CDT

This archive was generated by hypermail 2.2.0 : Thu Jun 04 2015 - 14:37:27 CDT