Re: XML and tags (LONG) (derives from Re: Plane 14 Tag Deprecation Issue)

From: William Overington (
Date: Fri Feb 21 2003 - 06:38:17 EST

  • Next message: William Overington: "Leonardo da Vinci and printing."

    Marco Cimarosti kindly responded to my post about XML and plane 14 tags.
    Here are some initial comments so as hopefully to set the scene. I am
    studying Marco's examples with interest. I am already learning a lot from
    them. I shall be interested to observe how other people on the Unicode
    mailing list respond to Marco's post.

    I do notice, however, that Marco does not program in XML any of the examples
    which I suggested in my post.

    Marco wrote.


    (Warning: I have probably succeeded in the impossible task of being more
    verbose than Mr. Overington. Please start reading only if you have a few
    free time... :-)

    end quote

    Well, I think that a verbose writing style often assists clarity. Writing a
    document is different from having a conversation. In a conversation if
    someone does not follow a line of reasoning then the person putting forward
    the reasoning can respond to questions or to body language and make a more
    detailed explanation, or perhaps approach the matter from another direction:
    in a posting there is a need to try to set out an argument with clarity of
    meaning all at once.


    >My job is to implement software based on written specifications which
    represent my bosses' understanding of the requirements of our customers.
    Unfortunately, the specifications I receive are often verbose and fuzzy like
    Mr. Overington's posts...

    Well, my posts are often verbose, yet I like to think, never fuzzy.

    >I will be pretending that William is actually "Overington Inc.", one of the
    key customers of the company I work with, and that they are asking me to
    implement a protocol to send text over the famous "Overington Multimedia
    Broadcasting (OMB)", with the following requirements:

    So rather than being willing to respond to an individual you pretend to
    respond to a non-existent company which cannot vote in elections yet which
    may be able to pay to have a vote at a Unicode Technical Committee meeting,
    which an individual cannot do! :-)

    Yet, this pretend scenario may be the root of missing one of the main points
    which I am trying to make. There is no such thing as OMB and there will not
    be because I want to use the DVB-MHP (Digital Video Broadcasting -
    Multimedia Home Platform) system, which is a system which can be used by
    everybody. Since 1978 I have been advocating that there should be a
    manufacturer-independent standard for telesoftware broadcasts. There has
    been much delay and problems in getting telesoftware implemented on a
    lasting basis because so many implementations of telesoftware were based on
    proprietary system specifications. The DVB-MHP system,
    for the details, is a manufacturer-independent system which uses Java. My
    suggested portable object code system, 1456 object code, (in speech, please
    say "fourteen fifty-six object code") which dates back in its first format,
    which I later discarded, to 1978, was an early attempt at such portability.
    However, although Java is the standard for DVB-MHP programs, Java doing far
    more than 1456 object code ever did, 1456 object code did have one feature
    which Java does not, and today 1456 object code has been redesigned to make
    use of that one feature, sitting on top of a Java platform. I like to think
    that 1456 object code today, which sits on top of a Java platform and uses
    Java for standardization of its data types, is a system which is useful in
    some, though by no means all, situations where a Java quality graphics
    effect is wanted on either the web or upon the DVB-MHP platform. 1456
    object code is particularly useful where people wish to write relatively
    short programs with Java quality graphics yet do not have ready access to a
    Java compiler and may not know any Java. For example, where a distance
    education author wants to produce an interactive illustration. This is
    because 1456 object code can be written directly in ASCII printable
    characters using a text editor such as Notepad. However, for use within a
    UTF-16 system, EA00 hexadecimal can be added to each of those characters so
    that the 1456 object code can be expressed using (some of the) Private Use
    Area characters in the range U+EA00 to U+EA7F.

    However, 1456 object code is just a format for using within a Java program;
    that is, the Java program treats the 1456 object code software program as
    being data for a specific Java program which is called a 1456 engine. So
    1456 object code is not a standard system, it just a programming language
    which can be processed by a Java program, so it can be used on the DVB-MHP
    platform without in any way needing the DVB-MHP system specification to be
    altered to accommodate its use. The DVB-MHP platform does specify the use
    of Java in a very detailed manner.

    That essential difference between how the DVB-MHP system relates to Java and
    how the DVB-MHP system relates to 1456 object code is important. It is, in
    my opinion, the same sort of difference as between how the Unicode
    specification relates to plane 14 tags and how the Unicode specification
    relates to element names in an XML file. I feel that that is the essential
    point which I am trying to convey.

    > 1. The text MUST be transmitted in UTF-8 (because the CEO of
    Overington Inc. thinks that UTF-8 is cute).

    Well, I, as an individual, was thinking in terms of UTF-16.

    >2. The transmission protocol MUST implement some form of language
    tagging (the details of the protocol are up to me). Particularly, the
    system needs to distinguish English text from Italian text, because the two
    languages will be displayed in different colors (green and red,

    Green for English, red for Italian. Are you by any chance a fan of the
    liveries of motor racing cars of the 1950s?

    > 3. The OveringtonHomeBox(tm) can only accept UTF-8 plain text
    interspersed with escape sequences to change color. The escape sequences
    have the form "{{color=1}}", where "1" is the id of a color (blue, in this

    If I were writing a one-off program I would use U+F3E2 for red and U+F3E5
    for green.

    However, the issue is not, in my opinion, about one-off programs and
    proprietary encodings. The issue is ensuring that plane 14 tags are not
    totally deprecated so that, as an option for use with particular protocols,
    they continue to be available so that encodings for general computing usage,
    for general and widespread information availability, on a rigorous
    non-proprietary encoding basis may be used. Certainly, within certain
    multimedia programs which might at some future time run upon the DVB-MHP
    platform, codes such as U+F3BC might be particularly useful, yet that is a
    matter which an individual programmer needs to consider when writing such a
    program: it is not a standard system, though it is not a proprietary system
    either in the usual sense of the word as those codes are published with the
    hope of being a consistent set which people may use if they so choose.
    Please note that, notwithstanding your pretend scenario of a company, that
    that is not the way I am proceeding with my research. I invented the
    telesoftware concept and am doing what I can to get it used effectively and
    to ensure that it can have scope for future development of content. I
    regard the continued availability of plane 14 tags as important, as it means
    that content authors can then use codes which do the job by finding them in
    an international standard, without having to use what I suggest. I could
    devise all manner of codes using plane 16 if I wish, copying the plane 14
    tags across as a start, yet those codes, no matter how fine, no matter how
    well publicised in research papers or in a book or whatever, those codes
    would never have the provenance of the codes in an international standard.
    That is why, although Private Use Area codes do certainly have a use for
    research and for concept proving, and also for limited use between two or
    more people studying something special topic, Private Use Area codes, and
    XML element names made up by a programmer or even by a committee which is
    not a standards committee, simply do not come into the same class of
    provenance quality as plane 14 tags which are in the Unicode standard. That
    is why I hope that the Unicode Technical Committee will not totally
    deprecate tags and will leave open the possibility of considering adding
    additional tag types at some time in the future.

    > 4. The text files being transmitted MUST be .... small (bandwidth is

    Yes, keep the text file size down, bandwidth is limited.

    > 5. The processing program must be .... small (on-board memory is

    No, for DVB-MHP the on-board memory is fairly large. The transmission link
    is the key issue.

    > 6. A working prototypes must be ready by tomorrow.

    Well, this is about the way that these things will be done well into the
    future. The idea of Unicode is that it will last, not be swept away within
    ten or twenty years because it is outdated for future needs.

    I have had a look through the example solutions, but, I do need to spend
    some more time studying them and hopefully trying out the executable
    programs with some other data files. In the meantime I would be interested
    to know any further views of Marco and the views of others on this topic.

    Thank you for taking the time to write your post and prepare the programs.
    I feel that it is important that this matter be studied thoroughly.

    William Overington

    21 February 2003

    This archive was generated by hypermail 2.1.5 : Fri Feb 21 2003 - 08:14:02 EST