Re: Code pages and Unicode

From: Ken Whistler <kenw_at_sybase.com>
Date: Wed, 24 Aug 2011 12:40:54 -0700

On 8/24/2011 10:48 AM, Richard Wordingham wrote:
> Those are two different claims. 'Never say never' is a useful maxim.

So is "Leave well enough alone."

The problem would be in using maxims instead
of an analysis of engineering requirements to drive architectural decisions.

> The extension of UCS-2, namely UTF-16, is far from optimal, but it
> could have been a lot worse - at least the surrogates are contiguous.
> All I ask is that we have a reasonable way of extending it

Why?

> if, say,
> code points are squandered.

Oh.

Well, in that case, the correct action is to work to ensure that code
points are
not squandered.

> I think, however, that<high><high><rare
> BMP code><low> offers a legitimate extension mechanism

One could argue about the description as "legitimate". It is clearly not
conformant,
and would require a decision about an architectural change to the standard.
I see no chance of that happening for either the Unicode Standard or 10646.

> that can
> actually safely be ignored when scattering code assignments about the
> 17 planes (of which only 2 are full).

A quibble (I know), but only 1 plane is arguably "full". Or, if you
count PUA, then
*3* planes are "full".

Here are the current stats for the forthcoming Unicode 6.1, counting
*designated*
code points (as opposed to assigned graphic characters).

Plane 0: 63,207 / 65,536 = 96.45% full
Plane 1: 7497 / 65,536 = 11.44% full
Plane 2: 47,626 / 65,536 = 72.67% full (plane reserved for CJK ideographs)
Plane 14: 339 / 65,536 = 0.52% full
Plane 15: 65,536 / 65,536 = 100% full (PUA)
Plane 16: 65,536 / 65,536 = 100% full (PUA)

--Ken
Received on Wed Aug 24 2011 - 14:44:01 CDT

This archive was generated by hypermail 2.2.0 : Wed Aug 24 2011 - 14:44:03 CDT