RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i

From: Ayers, Mike (Mike_Ayers@bmc.com)
Date: Fri Feb 23 2001 - 17:57:55 EST


> From: John Cowan [mailto:jcowan@reutershealth.com]
>
> Ayers, Mike wrote:
>
> > After
> > all, pretty much every ceiling ever established in
> computing has been broken
> > through, and there is no reason to believe that it won't
> happen again!
>
> On the contrary. There *are* reasons to believe that it won't happen
> in the case of character encoding.

        As well as reasons to believe that it will, as I explain below.

> As for breaking through every ceiling, consider the number of
> different assembly-language op codes. Do you foresee computer chips
> with 65,536 different opcodes? How about 4,294,967,296 distinct
> opcodes? I thought not.

        You thought wrong. Can you say "bit-slice"?

> Or consider IPv6 network addresses. There are
> 340,282,366,920,938,463,463,374,607,431,768,211,456 of them. They
> won't be assigned densely according to current plans, but they
> *could* be, and that would be enough IP addresses to have
> a few billion addresses for every soil bacterium in every
> square centimeter of soil on the planet. Do you really believe
> we are going to "break through" that?

        First - it's possible, but I don't want to go that far off topic.
Second, IPv6 exists largely because we ran out of IPv4 addresses. Claiming
that we haven't exhausted some resource before it's even deployed is silly,
especially when it's being deployed because its predecessor was exhausted.
This is, IMHO, an exact analogy to Unicode.

        Let me make my case a little differently. When ASCII was first
deployed, it was complete for its purpose. In fact, there are even
characters in there (such as "@" and "`") that look suspiciously like they
were added just to fill out the set (i.e. they would not likely appear on a
typewriter of the day, as I recall). So why the move to Unicode? The
*scope* changed! The character set which was well designed to do English
language teletype was being asked to do all sorts of things it had never
before dreamed of - it was overwhelmed. (Ooops, I slipped into
anthropomorphizationm there - sorry...) There was therefore a need for an
encoding that represented more - much more. Enter Unicode -ta daaaa!

        The idea that I am trying to push here is that while Unicode may be
near complete in its current scope, there is no reason to belive that this
scope will not change. In fact, as we watch previously banned musical
notation entered into the repertoire, we should acknowledge that it is
*already* changing - where she stops, nobody knows. I do not say that it
will happen - just that it might and that it wouldn't cost much to be
prepared.

        Since I started monitoring this list last year, the two most
repeated topics (other than AccuSplit pedometers) have been the 16 bit issue
and the naming of the planes. In all that I have read (I couldn't read it
all), the finger inevitably winds up pointing firmly at the Consortium, both
for promoting a 16 bit model, and for being so confusing once that model
didn't fit. In the end, what the supplemental planes (or supplementary
planes, or suppository planes, or whatever the official term is - I can
never remember) and the basic plane really are could be summarized as "the
original 16 bit character set and the 31 other 16 bit character sets". The
resemblance to an Intel 8088 is disturbing.

        Why some folk think it is so problematic just to prepare for the
future is beyond me. What little such preparation has been done in the past
has always been rewarded ('486 booster socket excepted).

/|/|ike



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT