Re: Keys. (derives from Re: Sequences of combining characters.)

From: Barry Caplan (
Date: Sat Sep 28 2002 - 12:50:35 EDT

  • Next message: Doug Ewell: "[OT] Time zone reported by e-mail (was: Re: Keys.)"

    At 12:24 PM 9/27/2002 +0100, William Overington wrote:
    >>You tell me which one is more
    >>likely to result in productive work and adoption by others.
    >Likelihood of success and what actually happens are not the same thing. I
    >do not know which is more likely as I do not know of what has happened

    Well, as you mentioned, the nature of scholarly research demands that you are familiar with the basis for the arguments being presented.
    If your goal is merely to build such a system, I am sure everyone is willing to concede that it is technically feasible, even bordering on trivial. It is not interesting in a scholarly sense at all, so it is only your ego that is going to benefit.

    If your goal is to enjoy some commercial success, well, that may be possible too. The utility of the application will be strongly limited by its lack of interoperability with other existing systems, many of which are used by the likely community of users for your system. That community has these choices:

            - Not use your system
            - Use your system and never interchange data
            - Use your system and roll their own tools to do data interchange
            - Use your system and demand data interchange tools from their other vendors
            - Use your tool and demand data interchange tools from you
            - Create a closed source functional near-equivalent of your tool with data interchange facilities
            - Create an open source functional near-equivalent of your tool with data interchange facilities

    Ponder very carefully the implications of each of these upon: the utility (usefulness and value) of your software, the effects on your limited resources of needing to support an extra layer of data interchange, and the effects on other vendors' limited resources of being asked to support data interchange with a proprietary format in limited use.

    If you want to share with program with a handful of folks, your proposal might fly. If you want real people in real places on earth to contribute text, then I predict issues will arise and you will lose all control because the last item in the list above will occur.

    Just to give you my sense of how much work it would take to do that, I think about a intense week or so for any experienced open source programmer for each type of UI is about right (GNOME, Web, etc), based on your description of the functionality and the availability of major modules such as XML, message catalog, UI, database and Web support.

    >Some people may have deleted the email, some may have read it and
    >disregarded it, yet it is possible that some people might have tried to
    >produce a comet circumflex button on the screen using an all-Unicode font
    >and might be considering the possibilities of how the system could be
    >applied or might even be writing an experimental software program which can
    >take comet circumflex sequences and process them through a database.

    Speaking of reading the sources, you might want to read Richard Dawkins' The Selfish Gene and other related works on memes to get a sense of why any alternative to XML for data interchange is likely to fail in the marketplaces of business and of ideas even if technically feasible.

    >The topic of keys generally which I have introduced

    Why are you claiming credit for a system which has been a core part of programming APIs since probably the 1960s? You can search for the documentation online for the "printf" function and its relatives for *nix, or resource APIs for Windows and Mac for a good start.

    Any translator who has done localization is familiar with the use of parameterized sentences that you describe, and why they are a problem when it comes to translation. I am sure I am not the only localization consultant on the list that preaches a very limited use of them (what I call "constructed sentences".

    >is potentially a
    >far-reaching development in the application of markup in Unicode based

    Its been done to death in the past. See Trados, Uniscape, GlobalSight, and countless in-house systems. The only revolutionary aspect is that you want to throw away all the experience and consensus that has been developing in the sw development, i18n, l10n, and transaltion communities about proper workflow and data interchange. If you came to me with such a tool in 1990, Unicode not withstanding, it may have been useful. But now, standalone tools are much less useful for a lot of reasons I won't go into here.

    >My own comet circumflex system may be highly useful in business
    >communications and distance education.

    May be, but most likely not. That you think so indicates you are after a commercial market, and I refer you to the discussion above of likely outcomes.

    >I am happy to respond to questions
    >and to consider documents which people suggest.

    I have suggested a lot in a message yesterday and a lot more here. I hope your future messages will take the material I have suggested into account.

    >XML exists and it uses U+003C in a way that makes using U+003C with the
    >meaning LESS-THAN SIGN in body text intermixed with markup sections awkward.
    >That feature of XML may not matter for situations involving encoding simply
    >literary works, yet for a comprehensive system which can include the U+003C
    >character with the meaning LESS-THAN SIGN in body text and in markup
    >parameters, it does not suit my need.

    You may be under the mistaken impression that any but the tiniest amount of raw XML is ever edited by hand. If you think your message creators are going to create your files, XML or "Comet Circumflex" in the actual markup language, well, that just won't happen often in practice. A UI which handles, well, the User Interface, will be needed, making the choice of markup language moot until it comes to what other systems can accept.

    >It is not a fact that my proposed markup convention, as you call it, is not
    >a good idea. It may be your opinion and it might perhaps be the opinion of
    >some other people. Yet my proposed markup convention, as you call it, is
    >entirely within the rules, for keys generally, as in my original post, and
    >for my comet circumflex key in particular.

    Know one is saying it is not valid Unicode. From a market acceptance point of view, you have seen a consensus that there are a lot of reasons why it probably is not a good idea, coming from people I know to have an enormous amount of experience in these specific matters upon which to draw these conclusions.

    I for one would be interested if you could come up with some others whose opinion supports your own, although perhaps off list is the place for that.

    >Why should the discussion be taken elsewhere? It is about the application
    >of Unicode to markup and of one particular application to language
    >translation in a manner where Unicode could be widely used, as the comet
    >circumflex system could be used with all of the languages which Unicode

    Well, the moderator keeps letting it go on... if not I am willing to carry it ad infintitum on - just click on "Submit story" on any page....

    >Actually, I was rather hoping that, with your specific interest in languages
    >that you would have wished to have a try at using the comet circumflex
    >system as one of the features of the comet circumflex system is that it
    >could be used with minority languages as easily as with the major languages
    >of the world.

    If I may speak for Peter, I think he would be willing to consider it were it XML based.

    However, I offer the caveat that you may be in for some rude surprises when you find out how hard it is to actually translate beyond the simplest sentences (and sometimes even that) when you parameterize them as you propose. I have been of the opinion for several years as far as localization goes, that it is better just to take out the parameters and list all the possibilities.

    Of course in the general case that may not be possible, but then you are in the realm of machine translation, which already exists for better or worse. So in your case, you may also need to make a case that your solution is more useful than just listing non-parameterized sentences, yet more likely to provide a useful translation than existing machine translation systems. Based on the example sentences about the weather in (London, Berlin, Tokyo) etc. from your original post, I would say that is a very open question.

    Barry Caplan

    This archive was generated by hypermail 2.1.5 : Sat Sep 28 2002 - 13:37:06 EDT