An application of Supplementary Private Use Area-B

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Tue Apr 15 2003 - 04:01:09 EDT

  • Next message: Michael Everson: "Azerbaijan"

    I posted the following to the DigitalTV group at the http://www.cenelec.org
    webspace last Saturday, 12 April 2003.

    I find it fascinating to consider to what extent this system could be used
    for broadcasting information which can be easily and promptly translated
    into the language of the end user and displayed upon the screen of his or
    her DVB-MHP interactive television.

    It is clear that some simple sentences can be used in this manner. Yet how
    far does that capability go? Can the system be used for e-commerce? I am
    finding that once the system is considered as a mathematical structure, then
    the nature of language and how it can be represented using mathematics
    arises. This is fascinating. By only using preset sentences this system
    avoids many of the problems of automated language translation, such as those
    of parsing an input sentence which is to be translated.

    The abbreviation DVB-MHP is for Digital Video Broadcasting - Multimedia Home
    Platform. Details of the system can be found at the http://www.mhp.org
    webspace. The DVB-MHP system uses Java programs and Java uses Unicode.

    William Overington

    15 April 2003

    ----
    Some possibilities regarding using many languages.
    In 2002 I carried out some initial research into the possibility of using a
    special encoding of preset sentences so as to facilitate the sending of
    messages in a format which could be easily translated into a large number of
    other languages.  I am now considering applying the experience gained in
    that initial research to producing a system specifically intended only for
    use upon a DVB-MHP channel under the carefully controlled conditions which
    are possible with a broadcast channel.
    This research used an adaptation, to a different application domain, using
    computerized methods, of a system for sending messages which was widely used
    on railway systems in the past using telegraph systems.
    The following is an interesting web based documentation of a coding used
    with a telegraph system.
    http://www.railpage.org.au/telecode/
    Please consider the following.
    http://www.railpage.org.au/telecode/tc05.gif.html
    Some messages are complete in themselves.  Some of the messages can be
    customized with one or more parameters, depending upon the particular
    message.  A parameter could be the name of place, a time or the index number
    of a locomotive.
    The possibility arises of using such a system to convey messages to end
    users upon a DVB-MHP channel which is broadcast to a number of countries,
    where a variety of languages are spoken, by having a collection of preset
    messages, some of which may be customized using one or more parameters,
    which may be processed in the DVB-MHP television set of an end user and then
    displayed in the local language.  The DVB-MHP television set would gather
    from the object carousel of the DVB-MHP channel the database files necessary
    to perform the translation from the transmitted codes into the natural
    language (for example, Finnish, German, Estonian) of the viewer of the
    television display.  The end user would have simply had to run an
    introductory program which asked for a choice of display language to be
    selected.
    For a carefully chosen collection of sentences the usefulness of such a
    system could be enormous within the European Union.  The selection available
    would need to be large enough to allow application for activities such as
    almost real-time encoding and translation of weather information and weather
    forecasts, road traffic information, some distance education applications
    and so on.  The selection would need to be sufficiently small so that having
    all of the sentences translated once into many languages would be a
    realistic possibility and that being able to have the database files
    broadcast upon a DVB-MHP channel would be possible, and that a European
    Union interactive television could handle the processing.
    The capabilities of the telesoftware system to treat the object carousel as
    a read-only disc drive in the sky and only store part of the database in the
    end user television at any one time could be useful in allowing specialist
    sentence collections to be used easily, yet that would need to be balanced
    against the speed requirement for producing the display, depending upon
    whether the translation needed to be fairly real-time or whether some delay
    would be acceptable, though that balance might vary greatly as between one
    particular application and another particular application.
    My initial research is available on the web.  It will hopefully provide some
    idea of the possibilities.  It is called the comet circumflex system.
    However, that system is initial research and has provided valuable
    experience upon which to build a system specifically intended only for use
    upon a DVB-MHP channel under the carefully controlled conditions which are
    possible with a broadcast channel, which system is somewhat different and
    more advanced.
    http://www.users.globalnet.co.uk/~ngo/c_c00000.htm
    On the following web page.
    http://www.unicode.org/charts/
    There is a link to enable downloading of the following file.
    http://www.unicode.org/charts/PDF/U100000.pdf
    It is about Supplementary Private Use Area-B of the Unicode code space.
    This is for the 65534 characters in the range U+100000 through to U+10FFFD.
    I am now starting to design a system where each whole preset sentence is
    represented by one character code from the range U+100000 through to
    U+10FFFD.
    Thus, for example, a sentence such as "It is snowing." would be encoded
    using one character code.
    A sentence such as "It is snowing in Mainz." would have one character code
    for the sentence part "It is snowing in" and a method of encoding the name
    of the City of Mainz as a parameter.  A name such as Mainz could be a
    literal name, as it would not be translated.  A name such as Rome or
    Florence would be a character code from a range of the U+100000 through to
    U+10FFFD code space assigned for a list of major cities which are translated
    into local languages.
    A two parameter sentence, used for a sentence such as "The temperature in
    Mainz is 21 degrees Celsius." would have one code point for the main
    structure of the sentence and the Java program in the DVB-MHP television
    would use whatever value it had in the locality register of its language
    system engine and whatever value it had in the numerical data register 1 of
    its language system engine to produce the text stream which is to be
    displayed upon the screen for the end user.  The language system engine
    being a software construct within a Java program, the Java program having
    been broadcast.  The system would appear to the end user as just providing
    detailed, up-to-date weather information in his or her own language.
    As there are many languages in use throughout the European Union, such a
    system might be very useful for fairly quick translation of fairly
    predictable types of information, such as the sentences required to produce
    a weather forecast.
    William Overington
    12 April 2003
    


    This archive was generated by hypermail 2.1.5 : Tue Apr 15 2003 - 05:20:07 EDT