Re: Java char and Unicode 3.0+ (was:Canonical equivalence in rendering: mandatory or recommended?)

From: John Cowan (
Date: Thu Oct 16 2003 - 06:25:00 CST

Philippe Verdy scripsit:

> I am also doubting, but I would not bet on it. After all, when Unicode
> started, a single plane was considered waaaaaay more than sufficient too.

I not only would bet on it, I actually have a bet on it. Henry Thompson
of the W3C's Schema WG bet me that we'd outrun the existing planes within
five years; four left to go and no sign of it, even if Michael Everson
were to achieve pluripresence and actually get everything accepted into
the standard that he knows needs to be done.

> But the objectives of Unicode have changed, and now Unicode must cooperate
> with ISO 10646 which has its own objectives too.

There are only so many character standardizers on the planet, and the UTC
and WG2 are joined at the hip for the simple reason that they mostly
consist of the same people.

> Of course Unicode objectives are focused on text, but there are much enough
> works in other media-related technologies that may justify in the future to
> encode explicitly attributed characters, distinct from a sequence of format
> controls and characters, or just to offer compatibility with other standards
> than ISO10646.

Dream on.

> There may exist at some time a need to define new classes of encoded
> "characters" that would require mapping them in a complex way using a lot of
> new codepoints, just because the ISO committee will want to include support
> for the work in some of the many other ISO commitees.

WG2 has indeed been merging the existing character standards of other ISO
committees, and a good many of the Unicode 4.0 characters actually
come from there. It's a trickle and will remain so.

> But just look at the rapid growth of encoded characters in Unicode: in just
> 4 years, 2 new planes have been nearly allocated or reserved for extension.

Not new, not unforeseen. Multiple planes has been formally part of
Unicode since 1996, and the need was established several years before that.

> Also, there tends to exist a lot of pressures everywhere to define new
> vendor-specific characters for specific usage, and the PUA ranges may
> finally accept a reservation mechanism by third parties so that they don't
> collide each other, in a way similar to Internet IPv4 addressing space
> reservation.

Not going to happen.

> What would happen if ISO10646 decided to stop its work, giving up to let
> IANA contract with external registrars, just to comply with an rapid
> industry need to publish more medias and still interoperate? There may exist
> some regulated areas in the new scheme on which Unicode would continue to
> work with (the 17 planes), but other parts documented and implemented
> elsewhere on which Unicode would have no control, and where there may exist
> a compatibility scheme.

There's no shortage of integers.

There is / One art                      John Cowan <>
No more / No less             
To do / All things            
With art- / Lessness                     -- Piet Hein

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST