From: David Starner (prosfilaes@gmail.com)
Date: Wed May 18 2005 - 22:38:10 CDT
On 5/18/05, Alexander Kh. <alexkh@writeme.com> wrote:
> not to mention
> the fact that local encodings are more well-thought in design.
That's absurd, considering that most local encodings in use were the
basis for the Unicode encoding of that script--in fact, many
complexities of Unicode can be attributed to a need for compatibility
with those local encodings--or were designed as a subset of Unicode.
> Also, consider this idea: how about using a code for "shift" key which will reduce
> in 2 usage of code space.
No one cares. Really. If you want something like this, look up SCSU on
the Unicode website. But the number of cases where the amount of space
wasted is important and standard compression algorithms can't be used
is rare. Adding additional complexity for saving a few bytes isn't a
good trade off.
> Consider this example: suppose I have a bilingual database: English-Russian for
> example. I am not planning to use all the Chinese Hieroglyphs, so why would I use
> 16-bit characters???
There is no 8-bit character set that supports both English and
Russian; the standard Russian character sets don't support accented
English characters. Besides which, it's rare that you have a large
stream of "English" data without any Spanish, French or German. I'm
sure Serbian, Ukranian and other odd letters slip into Russian text as
names and other ways.
Besides which, it's painful to handle a huge collection of encodings
and constantly have to do interconvertions (which always fail in some
way, because two 8-bit encodings never have one to one mappings.)
> And also, every script has its own particular properties, for example, letter ordering,
> case sensitivity, numeric systems et.c. It will be difficult to maintain all those
> special particularities of every script in a rigid standard anyway. This will result
> in big overhead, requiring huge amounts of programming and resources to map all those
> orderings and other particularities into one standard interface. The local encodings
> are aware of those particularities and are designed for a particular purpose each.
Local encodings aren't aware of anything. Code is "aware" of those
particularities, and all the local encodings do is make the code more
complex. Unicode lets the code handle those particularities as
consistenly as possible.
This archive was generated by hypermail 2.1.5 : Wed May 18 2005 - 22:38:54 CDT