Re: AW: ASCII and Unicode lifespan

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu May 19 2005 - 11:20:40 CDT

Next message: Peter Constable: "RE: Stateful encoding mechanisms"

Previous message: Philippe Verdy: "Re: ASCII and Unicode lifespan"
In reply to: Dominikus Scherkl: "AW: AW: ASCII and Unicode lifespan"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Dominikus Scherkl" <lyratelle@gmx.de>
> Naa, I taked about the far, far future where all 17 planes become
> too full (so, how can a new plane be opened when there is no
> space left to define new surrogate-pairs in the bmp?)

There are still some free columns in the BMP to add new surrogates if needed
in some future: just between the existing Hangul syllables block and the
existing surrogates (D7B0 to D7FF: that's 80 code units).

Using them would require changing UTF-16, or designing a new UTF: these
additional super-surrogates could become the leaders of a 3-surrogates
sequence, followed by the two existing surrogates code units. It would allow
extending the code space with 1280 additional planes (if all the 80
positions are allocated for this usage), and so the new codespace would have
a total of 1297 planes (codepoints up to U+511FFFF).

The other way of extending the codepoint space would be to allocate the new
surrogates in existing supplementary planes (probably in the SSP, i.e. in
plane 14), and then using the existing UTF-16 rules to map them with pairs
of surrogates in the BMP. This approach would be compatible with existing
UTF-16 decoders, even though they will decode the whole sequence of 4
surrogates as two codepoints in the SSP, instead of one codepoint in the
extra planes.

The bad thing about this approach is the total encoding length (4 UTF-16
surrogates, i.e. 8 bytes for a single codepoint that would be at most 31-bit
wide), but as we speak about a far, far future, will it be really a problem
on future platforms, and won't UTF-16 be deprecated since long by UTF-32?

At the same time, UTF-8 and UTF-32 would need to be extended and given new
identifiers (like UXF-8, UXF-16, UXF-32) to cope with the extended code
point space.

So I don't think we have real limitations for now, and ample space for the
future...

Next message: Peter Constable: "RE: Stateful encoding mechanisms"
Previous message: Philippe Verdy: "Re: ASCII and Unicode lifespan"
In reply to: Dominikus Scherkl: "AW: AW: ASCII and Unicode lifespan"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu May 19 2005 - 15:09:34 CDT