Re: Unicode Stability (Was: Re: E0000 Language Tags for Some Obscure Languages)

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Mar 03 2005 - 15:49:25 CST

  • Next message: Peter Kirk: "Re: Ambiguity and disunification"

    > > However, as we have seen when we created the Unicode
    > > Standard, existing character sets have a way of forcing a new
    > > standard to be compatible, lest it be a non-starter. The same
    > > pressure, magnified, would face any successor standard to the Unicode
    > > Standard.
    > >
    > Ever increasing chaos and entropy ?

    No, just lengthening lists of conversion tables that systems
    need to support, with old, old ones eventually dropping off the
    list as systems eventually obsolete them from non-use and as
    the archaic data stores that exercised them become unused
    and inaccessible.

    Many old character sets die off when the data stores and technology
    using them become museum pieces. Who worries about the Amiga
    character set nowadays? Anything stored only on 5-1/2" floppies
    or on 9-track tape is getting to the point where new employees
    coming into the office are more likely to take a quick trip to
    the dumpster out back rather than try to figure out how to
    recover and convert any of the data.

    The wide usage of the internet in the last decade has changed
    the nature of the process, of course. There is now a kind of
    information evolution, where some types of encodings survive
    and prosper, and others don't do so well. Unicode will inevitably
    come out on top in that, because of the nature of the internet,
    but it won't completely displace some dozens of other encodings
    that are "fit" enough to survive and propagate into enough
    data stores to maintain a longterm viability.

    >
    > I suppose one solution could be to deprecate certain usages and
    > characters for a few decades and then hope that by the time the
    > successor character encoding comes along people have restrained from
    > using the deprecated characters and data have been reconverted in the
    > meantime (because, for instance, a change in the formatting or tagging
    > language has forced the conversion). Of course, decades in our world of
    > immediacy sounds a bit out of this world...

    There may be instances of deprecation. We've seen some already
    in Unicode. But another way obsolete characters drop
    out is simply by disuse and being ignored. Consider the fate
    of most C0 and C1 control characters. Unicode is unlikely to
    formally deprecate U+0016 <control> (= SYNCHRONOUS IDLE), but you
    will have to search a long ways to find any Unicode implementation
    or data store that does anything interesting with that other
    than faithfully converting 0x16 <--> U+0016 for legacy encodings.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Mar 03 2005 - 15:51:02 CST