Re: Text Editors and Canonical Equivalence (was Coloured diacritics)

From: Benjamin Peterson (ben@jbrowse.com)
Date: Thu Dec 11 2003 - 13:15:54 EST

  • Next message: Philippe Verdy: "RE: Text Editors and Canonical Equivalence (was Coloured diacritics)"

    On Thu, 11 Dec 2003 09:05:10 -0800, "Michael (michka) Kaplan"
    <michka@trigeminal.com> said:

    > I think you are mostly mistaken here. All of the programmers I know (i.e.
    > script kiddies need not apply? <grin>) call APIs. The bulk of those APIs
    > deal with APIs that have no notion of any of this. They take LPWSTR or
    > WCHAR
    > * and a developer who does not know what those are or who incorrectly
    > assumes that they are grapheme clusters will not be able to function very
    > effectively.

    That is the current situation for some, but it is not a desirable or
    permanent situation, nor an intrinsic property of non-'script kiddy'
    programming. Those APIs used to take char*s, and before that they took
    7-bit byte addresses, and those bad days are now behind (most of) us.

    As an application programmer, I would certainly consider a system that
    insulated me from the byte/WCHAR representation of a string (except when
    asked not to) to be a better system. And systems are improving in this
    respect year by year. I see that in .NET I can actually step through a
    string accented-character by accented-character -- wonder upon wonders!
    With a lot of luck I might never have to use an C-style array as a string
    again.

    > Most programmers (even ones who DO deal with graphene clusters)
    > need to be working below the level to which you are referring here.
    >

    This is true, but it is a result of inadequacies in their environments,
    inadequacies that are being fixed quite rapidly.

    I remember what a huge barrier the division between single and multibyte
    text once seemed -- and what huge advance it was when the win32 api
    became widespread and finally you could translate your English data to
    Chinese without a ground-up review of the entire system (well, unless you
    had to deal with GNU utils or Unix). Now the 'characters that are
    composed of more than one byte' barrier is behind us and we are pushing
    up against a new barrier, text elements that are composed of more than
    one combining character. I fully expect this challenge to be overcome as
    well -- the linguistic details may go on forever but in terms of
    implementing your friendly local string type it ain't rocket science. If
    application programmers are still looking at arrays of WCHARs in ten
    years it'll be very surprising -- and _very_ depressing.

    -- 
      Benjamin Peterson
      bjsp123@imap.cc
    


    This archive was generated by hypermail 2.1.5 : Thu Dec 11 2003 - 15:24:16 EST