Re: ASCII and Unicode lifespan

From: David Starner (
Date: Wed May 18 2005 - 22:38:10 CDT

  • Next message: Alexander Kh.: "Re: ASCII and Unicode lifespan"

    On 5/18/05, Alexander Kh. <> wrote:
    > not to mention
    > the fact that local encodings are more well-thought in design.

    That's absurd, considering that most local encodings in use were the
    basis for the Unicode encoding of that script--in fact, many
    complexities of Unicode can be attributed to a need for compatibility
    with those local encodings--or were designed as a subset of Unicode.

    > Also, consider this idea: how about using a code for "shift" key which will reduce
    > in 2 usage of code space.

    No one cares. Really. If you want something like this, look up SCSU on
    the Unicode website. But the number of cases where the amount of space
    wasted is important and standard compression algorithms can't be used
    is rare. Adding additional complexity for saving a few bytes isn't a
    good trade off.

    > Consider this example: suppose I have a bilingual database: English-Russian for
    > example. I am not planning to use all the Chinese Hieroglyphs, so why would I use
    > 16-bit characters???

    There is no 8-bit character set that supports both English and
    Russian; the standard Russian character sets don't support accented
    English characters. Besides which, it's rare that you have a large
    stream of "English" data without any Spanish, French or German. I'm
    sure Serbian, Ukranian and other odd letters slip into Russian text as
    names and other ways.

    Besides which, it's painful to handle a huge collection of encodings
    and constantly have to do interconvertions (which always fail in some
    way, because two 8-bit encodings never have one to one mappings.)
    > And also, every script has its own particular properties, for example, letter ordering,
    > case sensitivity, numeric systems et.c. It will be difficult to maintain all those
    > special particularities of every script in a rigid standard anyway. This will result
    > in big overhead, requiring huge amounts of programming and resources to map all those
    > orderings and other particularities into one standard interface. The local encodings
    > are aware of those particularities and are designed for a particular purpose each.

    Local encodings aren't aware of anything. Code is "aware" of those
    particularities, and all the local encodings do is make the code more
    complex. Unicode lets the code handle those particularities as
    consistenly as possible.

    This archive was generated by hypermail 2.1.5 : Wed May 18 2005 - 22:38:54 CDT