From: Alexander Kh. (firstname.lastname@example.org)
Date: Thu May 19 2005 - 01:50:55 CDT
> On 5/18/05, Alexander Kh. <email@example.com> wrote:
> > the fact that local encodings are more well-thought in design.
> That's absurd, considering that most local encodings in use were the
> basis for the Unicode encoding of that script--in fact, many
> complexities of Unicode can be attributed to a need for compatibility
> with those local encodings--or were designed as a subset of Unicode.
Huh? Most of them are not even ISO, what are you talking about? New ones
are emerging even today, for example, KOI8-C, which unifies Russian,
Ukranian, Belorussian, Serbian and Macedonian + 3 letters used in Russia
before 1920's: YAT', FITA, IZHITSA, which were previously missing. That
encoding is alive and kicking, and with only 8-bits per character, thanks
to open source. I did not pay anything. With that shift key - even more
letters would fit in!
> > Also, consider this idea: how about using a code for "shift" key
> > which will reduce
> > in 2 usage of code space.
> No one cares. Really. If you want something like this, look up SCSU on
> the Unicode website. But the number of cases where the amount of space
> wasted is important and standard compression algorithms can't be used
> is rare. Adding additional complexity for saving a few bytes isn't a
> good trade off.
SCSU? Now, that's what I call another level of complexity. Gzipping is
enough. However things like indexes in database cannot be gzipped, and
they sometimes make up 70% of the database. Unicode itself is not perfectly
suited for sorting alphabetically, mind you.
> > Consider this example: suppose I have a bilingual database:
> > English-Russian for
> > example. I am not planning to use all the Chinese Hieroglyphs, so
> > why would I use
> > 16-bit characters???
> There is no 8-bit character set that supports both English and
> Russian; the standard Russian character sets don't support accented
> English characters. Besides which, it's rare that you have a large
> stream of "English" data without any Spanish, French or German. I'm
> sure Serbian, Ukranian and other odd letters slip into Russian text as
> names and other ways.
Koi8-C is not to bad. Would be better if it used shift key encoded into
the ASCII part, as I mentioned
above, the considerable freed space could be used up by those missing
characters. And again, for imbedded text in different language I mengioned
an encoding selector sequence (an escape code). Still being a UTF-8 mod the
last resort will be using usual UTF-8's way to represent Unicode. Hieroglyphs
won't benefit from UTF-8's compactness anyway.
> Besides which, it's painful to handle a huge collection of encodings
> and constantly have to do interconvertions (which always fail in some
> way, because two 8-bit encodings never have one to one mappings.)
Mapping is always a problem. Unicode itself has to be mapped for sorting in
alphabetical order for some scripts. I guess it would make sense
to map letter "A" of all scripts (where similar letter exists) into one place,
and then specify which script it is. This would simplyfy transliteration for
similar scripts at least. What do you think? I have not thought about it
much yet myself.
> > And also, every script has its own particular properties, for
> > example, letter ordering,
> > case sensitivity, numeric systems et.c. It will be difficult to
> > maintain all those
> > special particularities of every script in a rigid standard
> > anyway. This will result
> > in big overhead, requiring huge amounts of programming and
> > resources to map all those
> > orderings and other particularities into one standard interface.
> > The local encodings
> > are aware of those particularities and are designed for a
> > particular purpose each.
> Local encodings aren't aware of anything. Code is "aware" of those
> particularities, and all the local encodings do is make the code more
> complex. Unicode lets the code handle those particularities as
> consistenly as possible.
What I meant for example, is that KOI-8 was designed for simple
transliteration: the order of letter more or less conicided with
a similar Latin version, and so I could read Russian texts even
on systems where no Russian font is installed. The very design of the
font provided for such a convenience without much complexity to code.
That's just an example of what
I meant by "being designed for a particular purpose". For sorting
purposes of course it is better if the glyphs are in alphabetical order.
For example, if I were to sort an Old-Slavonic text, I would have to
make my owh character map in order to put the letters in Unicode in
proper order. I don't really see another way to sort those letters.
I am sure Unicode will be popular like Pop music is. Most people
don't use old scripts. Me - I can't even write a simple text in Old-Slavonic
for there are letters missing. Now same thing with Glagolitic. Maybe,
this kind of ignorance is only towards Slavic scripts, which are being
stepped on. I imagine most people will never understand what my problem is.
-- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
This archive was generated by hypermail 2.1.5 : Thu May 19 2005 - 01:53:11 CDT