Re: ASCII and Unicode lifespan

From: Alexander Kh. (alexkh@writeme.com)
Date: Thu May 19 2005 - 21:27:11 CDT

  • Next message: Alexander Kh.: "Re: Stateful encoding mechanisms"

    Here is a digest of 3 messages I wrote:

    Greetings:

    From: "Mark Davis" <mark.davis@jtcsv.com>

    > > > That I realize. Especially when it is Microsoft who's paying most part
    > > of the
    > > > bill
    >
    > Your assertion about "paying most of the bill" is incorrect. The consortium
    > is and has been supported by a wide variety of companies and other
    > organizations, as seen from the membership list on
    > http://www.unicode.org/consortium/memblogo.html. Of course, Microsoft has
    > made other very important contributions as seen on
    > http://www.unicode.org/consortium/donors.html, plus of course all the
    > technical expertise they have contributed --but so have other members,
    > including my own company.

    I apologize for using general phraze "paying the bill". I did not mean
    the monetary contribution as such. Yes, I have seen that list before.
    What I meant is the role the support of Unicode in Microsoft's operating
    systems will play in advertising the encoding.

    > Now, if you mean the amount of money that Microsoft has devoted to
    > implementing the standard -- compared to the amount others have -- that is a
    > different topic. I have a certain degree of skepticism that you are in a
    > position to make any such claim, unless you are miraculously privy to the
    > details of the budgets of all the organizations involved.
    >
    > &#8206;Mark

    Of course I did not mean that. The monetary manipulations of big companies is
    in no way my concern, since I am not even a share holder. Just trying to start
    my own business, but I guess patents issues sooner or later eventually kill my
    business-to-be either I use Unicode or not. :-)

    Best regards,

    Alexander Kh.

    --
    From: "Peter Constable" <petercon@microsoft.com>
    > > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
    > On Behalf
    > > Of Alexander Kh.
    >
    >
    > > That I realize. Especially when it is Microsoft who's paying most part
    > of the
    > > bill - I totally foresee that their systems will be based on what they
    > payed
    > > for.
    >
    > If Microsoft disappeared tomorrow, what Mark said would still apply to
    > everybody else.
    >
    True, some other monopolist will take the flag of supporting Unicode. I wonder
    if Open Source community will come up with some coding system which will take
    into account the mistakes of Unicode. By the way, I think it is worthwhile
    to collect a list of shortcomings of Unicode so the history of errors wont
    repeat again. I am sure there are reasons for those errors one of which is of
    course the bureaucracy of large organization and another is probably 
    hurry with
    which the coding table is being made.
    >
    > > However, many people still pay for traffic, and switching from local
    > > encoding to unicode will mean double the traffic right away.
    >
    > A doubling of size in the text content they interchange pales in
    > comparison to the photos sent from cell phones or in email, or all the
    > graphics images they download when they surf the Web, or the MP3 files
    > they download. If all someone ever does is send/receive plain-text
    > email, then this argument is valid, but I don't think there are many
    > people like that today.
    >
    >
    > > However, if using
    > > state-machine approach, encodings can be changed on-the-fly by using a
    > special
    > > escape-code. That's one way of getting benifits of both approach, not
    > to mention
    > > the fact that local encodings are more well-thought in design.
    >
    > A better approach, rather than using multiple character encodings, is to
    > use a transfer-encoding syntax that can compress the content, such as
    > SCSU.
    >
    > Unicode is, relatively speaking, much more hopeful for you than was they
    > had to deal with.
    I have to deal with Koi-8, win-1251, and use them a lot on my websites.
    Many developers are reluctant to switching to Unicode, for one or another
    reason. One of the reasons is poor support by databases, computer languages
    and web browsers. I now met new difficulties: Unicode has lots of scripts,
    yes, but they are incomplete, messy, unreliable, and require a lot of
    figuring out what the order of that particular alphabet is, what letters are
    to be considered as meaningful and what letters are pure fantasy. So far I
    don't see how Unicode will solve my problem of internationalization. I'm not
    even considering using UTF-16, even UTF-8 seems too wasteful as it is now.
    Most of my messages will be in two languages anyway.
    > > Consider this example: suppose I have a bilingual database:
    > English-Russian for
    > > example. I am not planning to use all the Chinese Hieroglyphs, so why
    > would I use
    > > 16-bit characters???
    >
    > Unless you're storing this database on a cell phone or PDA, what do you
    > care? Hard disk volume is cheap.
    >
    I want to store most often used data in RAM. Hard drives are too slow and
    I often see websites that take more time to generate page than to send it.
    Don't you?
    >
    > > ... This will result
    > > in big overhead, requiring huge amounts of programming and resources
    > to map all
    > > those
    > > orderings and other particularities into one standard interface. The
    > local encodings
    > > are aware of those particularities and are designed for a particular
    > purpose each.
    > > It will be more reasonable to continue using local encodings for some
    > applications.
    >
    > If you're creating an application that needs to work for only certain
    > languages, using Unicode doesn't require you to support *all* of Unicode
    > in that app.
    >
    > Peter Constable
    Totally agree with that.
    Best regards,
    Alexander Kh.
    --
    ----- Original Message -----
    From: "John H. Jenkins" <jenkins@apple.com>
    To: Unicode <unicode@unicode.org>
    Subject: Re: ASCII and Unicode lifespan
    Date: Thu, 19 May 2005 09:23:33 -0600
    >
    > Having an ISO 2022-type approach means that not only do I have to  
    > keep track of all the complexity that Unicode requires but I must  
    > *also* deal with the additional headache of the bookkeeping  
    > associated with the multiple encodings (converting data back and  
    > forth, among other things) *and* the bookkeeping of maintaining the 
    >   state information.  If I'm writing a word processor, it means I 
    > have  to be prepared for the document to switch character sets 
    > halfway  through.
    That can be partially solved by using a more object-oriented approach
    when defining characters sets by providing commonly used functions
    together with the table, like sorting lists of sctrings, transliterating
    to Latin, unicode mapping, and other service functions. Incorporated
    in the software directly those objects would allow to deal with that
    particular encoding if it at all needs to be parsed.
    >
    > In other words, you don't save effort at all.  A state-based 
    > multiple- encoding world is considerably *more* difficult for the 
    > programmer.   All you save is storage space on disk and 
    > transmission, and in  today's world, that's really not an enormous 
    > cost anymore.
    State-based will mean that random searching will not work unless
    the blocks of differently encoded text are indexed. As far as space is
    concerned it's like money: 2 millions is always better than 1 million,
    is not it?
    > 3)  For users of minority and rare languages and scripts, the fact  
    > that there has to be additional effort to create and maintain  
    > software which supports their particular needs means that their 
    > needs  are never met...
    You still need sorting algorythms and transliteration rules for
    Unicode, and since the table is multiplying - the complexity of program
    multiplies with the table, since the confusion is multiplying with
    the table's growth: the letters get more and more spread out.
    > (So far as I know, nobody ever implemented ISO 2022  in all its 
    > glory; they just had a specific market they wanted to  focus on and 
    > stuck there.)  Large companies aren't willing to invest  that 
    > effort for small markets, so there isn't support at the system  
    > level, and shoe-horning support into the system by a third party is 
    >  difficult if not impossible.  (I know whereof I speak, having 
    > written  the Deseret Language Kit for Mac OS 8.)  With the Unicode 
    > approach,  since you get every script and language for free, 
    > additional scripts  and languages can be supported via add-ons with 
    > minimal effort.  Even  third-party add-ons will work in most cases 
    > with relatively little  effort.
    That's why i mentioned OOP approach: service functions should come
    with the code table, for ease of use. That's just an idea...
    Best regards,
    Alexander Kh.
    --
    -- 
    ___________________________________________________________
    Sign-up for Ads Free at Mail.com
    http://promo.mail.com/adsfreejump.htm
    


    This archive was generated by hypermail 2.1.5 : Thu May 19 2005 - 21:28:15 CDT