From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Mar 03 2005 - 15:49:25 CST
> > However, as we have seen when we created the Unicode
> > Standard, existing character sets have a way of forcing a new
> > standard to be compatible, lest it be a non-starter. The same
> > pressure, magnified, would face any successor standard to the Unicode
> > Standard.
> >
> Ever increasing chaos and entropy ?
No, just lengthening lists of conversion tables that systems
need to support, with old, old ones eventually dropping off the
list as systems eventually obsolete them from non-use and as
the archaic data stores that exercised them become unused
and inaccessible.
Many old character sets die off when the data stores and technology
using them become museum pieces. Who worries about the Amiga
character set nowadays? Anything stored only on 5-1/2" floppies
or on 9-track tape is getting to the point where new employees
coming into the office are more likely to take a quick trip to
the dumpster out back rather than try to figure out how to
recover and convert any of the data.
The wide usage of the internet in the last decade has changed
the nature of the process, of course. There is now a kind of
information evolution, where some types of encodings survive
and prosper, and others don't do so well. Unicode will inevitably
come out on top in that, because of the nature of the internet,
but it won't completely displace some dozens of other encodings
that are "fit" enough to survive and propagate into enough
data stores to maintain a longterm viability.
>
> I suppose one solution could be to deprecate certain usages and
> characters for a few decades and then hope that by the time the
> successor character encoding comes along people have restrained from
> using the deprecated characters and data have been reconverted in the
> meantime (because, for instance, a change in the formatting or tagging
> language has forced the conversion). Of course, decades in our world of
> immediacy sounds a bit out of this world...
There may be instances of deprecation. We've seen some already
in Unicode. But another way obsolete characters drop
out is simply by disuse and being ignored. Consider the fate
of most C0 and C1 control characters. Unicode is unlikely to
formally deprecate U+0016 <control> (= SYNCHRONOUS IDLE), but you
will have to search a long ways to find any Unicode implementation
or data store that does anything interesting with that other
than faithfully converting 0x16 <--> U+0016 for legacy encodings.
--Ken
This archive was generated by hypermail 2.1.5 : Thu Mar 03 2005 - 15:51:02 CST