Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

From: Asmus Freytag <>
Date: Fri, 19 Aug 2011 17:18:21 -0700

On 8/19/2011 2:35 PM, Jukka K. Korpela wrote:
> 20.8.2011 0:07, Doug Ewell wrote:
>> Of course, 2.1 billion characters is also overkill, but the advent of
>> UTF-16 was how we ended up with 17 planes.
> And now we think that a little over a million is enough for everyone,
> just as they thought in the late 1980s that 16 bits is enough for
> everyone.

The difference is that these early plans were based on rigorously *not*
encoding certain characters, or using combining methodology or variation
selection much more aggressively. That might have been more feasible,
except for the needs of migrating software and having Unicode-based
systems play nicely in a world where character sets had different ideas
of what constitutes a character.

Allowing thousands of characters for compatibility reasons, more than
ten thousand precomposed characters, and many types of other characters
and symbols not originally on the radar still has not inflated the
numbers all that much. The count stands at roughly double that original
goal, after over twenty years of steady accumulation.

Was the original concept of being able to shoehorn the world into
sixteen bit, overly aggressive? Probably, because the estimates had
always been that there are about a quarter million written "elements".
If you took the current repertoire and used code-space saving techniques
in hindsight, you might be able to create something that "fits" into
16-bits. But it would end up using strings for many things that are now
single characters.

But the numbers, so far, show that this original estimate of a quarter
million, rough as it was, appears to be rather accurate. Over twenty
years of encoding characters have not been enough to exceed that.

The million code points are therefore a much more comfortable "limit"
and, from the beginning, assume a ceiling that has ample head-room (as
opposed to the "can we fit the world in this shoebox" approach of
earlier designs).

So, no, the two cases are not as comparable.

Received on Fri Aug 19 2011 - 19:20:32 CDT

This archive was generated by hypermail 2.2.0 : Fri Aug 19 2011 - 19:20:32 CDT