Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Fri Aug 16 2002 - 11:58:58 EDT


Tex Texin wrote as follows.

>William,
>
>So let me see if I understand this correctly.
>
>Let's take 2 perfectly good standards, Unicode and HTML,

Yes.

and make some
>very minor tweaks to them,

No.

such as changing the meaning of U+FFFC and a
>special format for filenames in the beginning of the file and a new
>extension, so we have something new.

I have suggested no changes whatsoever to HTML at all.

The only thing which I have suggested in relation to Unicode in this thread
is that, in relation to the fact that information about the object to which
any particular use of U+FFFC refers is kept outside the character data
stream, that it could be a good idea to define a file format .uof so that
details of the names of the files for which the U+FFFC codes are anchors
could be provided in a known format, if and only if end users chose to use a
.uof file for that purpose on that occasion and not otherwise. This was in
the context of seeking to protect the use of U+FFFC as a character which
could be used in interchanging of documents following from the discussion of
U+FFFC and annotation characters in the thread from off of which I spun this
thread, which discussion, by Ken and Doug, is repeated in the first posting
of this present thread.

I thought it a good idea that the Unicode Technical Committee might like to
make such a .uof file format an official Unicode document so as to offer one
possible way to use U+FFFC codes. That is now a matter for discussion. If
the Unicode Consortium wishes to do that, then fine. If the Unicode
Consortium chooses not to do that, then I can write it up myself and publish
it, which is not such a good solution, yet is adequate for my own needs and
might be useful for some other people if they choose to use the same format
for .uof files.

Hopefully I have now managed to raise the issue of protecting the fact that
the U+FFFC character can be used in document interchange and it will
hopefully not become deprecated to the status of a noncharacter.

There is a practical reason for this, which is, from my own perspective,
quite important. This is as follows.

The DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system
(details at http://www.mhp.org ) which implements my telesoftware invention.
A Java program which has been broadcast can read a Unicode plain text file
and act upon the characters within it, and can read other file formats, such
as .png files (Portable Network Graphics) and act upon the information in
those files, so as to produce a display.

So, a collection of files, namely a .uof file in the format that I suggested
it, a Unicode plain text file with one or more U+FFFC characters in it and
the appropriate graphics files in .png format as a package of free to the
end user distance education learning material being broadcast from a direct
broadcasting satellite or a terrestrial transmitter could be a very useful
facility as the way to carry text with illustrations.

Using HTML and a browser is just not the way to proceed in that situation.
HTML and a browser is a very useful technique for the web and indeed is an
option for the DVB-MHP system, yet the basic software system is Java based.
It is as if the television set is acting as a computer which has a slow read
only access disc drive in the sky from which it may gather information,
including software. The system is interactive with no return information
link to the central broadcasting computer, by means of the telesoftware
invention. Overlays and virtual running with programs bigger than the local
storage being able to be run using chaining techniques are possible. Please
do not think of this as downloading as no uplink request is made!

>Now the big benefit of this completely new thing,

Well, it's only a way of sender and receiver being able to have information
in a file with the suffix .uof about what objects are being anchored by
U+FFFC codes in a Unicode plain text file which it accompanies.

is that programs that
>do desktop publishing can use plain text files which are not quite plain
>text because they have some special formatting,

Well, the plain text files are only Unicode plain text which might contain
one or more U+FFFC characters and some of the other Unicode control
characters such as CARRIAGE RETURN.

but now they can publish
>them in better manner than before.

Well, my thinking is that it would help to have a well known way to express
the meaning of the anchors encoded by U+FFFC in a file rather than having
only a vague specification that all other information about the object is
kept outside the data stream. I am saying that, yes, all other information
about the object is kept outside the data stream and, if, and only if, end
users choose to use a .uof file in a standard format to convey that
information for some particular use of a U+FFFC code, then that format could
be considered for definition and publication by the Unicode Consortium.
That does not seem unreasonable to me. If I have suggested something which
is not possible then please say so and I can publish such a .uof format
myself. However, as U+FFFC is a regular Unicode character I did not feel
that it would be correct to just do that unless the Unicode Consortium does
not wish to do that.

Certainly, I would much prefer that the Unicode Consortium chose to publish
a .uof format, yet the key issue in this thread is that the U+FFFC code
point not become deprecated as a noncharacter, as it can then be used in the
future in Unicode compliant systems.

>For example, plain text with
>pictures. This is great. (It is true that it is less capable than if we
>had just used enough html to do the same thing, but .uof is more like
>plain text than html is.) Programmers will be happy because now they can
>support plain text with just a few tweaks. Oh I almost forgot, they also
>have to support Unicode, but slightly tweaked. And they can also support
>HTML, with some minor tweaks for .uof.

No tweaks for HTML or for Unicode, just, if they choose to use it, a
convenient standardized way to express the meanings of the anchors, using a
separate, accompanying .uof file, which is just a plain text file or a
straightforward fancy text file at the choice of the programmer, depending
upon whether minimum or greater file addressing information is included in
the .uof file.

>Of course programmers don't mind
>supporting lots of variations of the same thing.

Lots of variations of what thing?

The variation is implicit in the vagueness of the definition of the U+FFFC
character in the Unicode specification. My suggestion is more likely to
reduce the number of variations of the way of conveying information about
the object which a particular use of the U+FFFC character anchors.

>Customer support
>personnel also don't mind.

Customer support, once shown how to use a .uof file, might well find that a
.uof saves them a lot of work and they know what to look for rather than
having to try to work out what is happening using just the vagueness of the
Unicode specification on the matter. Anyway, it is hardly an issue which is
likely to come up very often except in specialist situations.

>Oh, the plain text programmers will now need to support pictures and
>other aspects of full publishing, but at least they won't have a complex
>file format to work with. I guess it doesn't matter that a more complex
>format is also more expressive and therefore can leverage all of the
>publishing features.

No they won't. If they do not use a U+FFFC code in their plain text files,
then no pictures are involved.

>It probably doesn't matter that a desktop
>publishing product probably already supports more complex formats, and
>probably also supports html, it will be beneficial to add this slight
>difference from plain text.

Well, it depends which packages someone can afford in the first place. Not
everybody is able to afford the expensive packages manufactured by various
companies. Some people might welcome being able to program their own
applications.

>
>I like this very much.

Good.

>It is very much like when the magician slides the
>knot in the string and makes it disappear.

Ah?

>I imagine that over time we will have some more wonderful inventions and
>add further tweaks and further improve the publishing of plain text.

Hopefully all three of those imaginings will come true.

>There are a few other things I would like to improve in Unicode, so I
>hope it will be ok to make some other suggestions.

Yes, certainly.

> We can change the
>extention to know which tweaks we are talking about. .uo1, .uo2. Just a
>few small changes to characters and plain text format variations.

Well, there could be various file name extensions if you wish, as two names
for the same thing is no problem: it is when there is one name for two
different things where problems occur.

Characters cannot be changed. The names are fixed as part of the Unicode
standard. Also, in relation to the courtyard codes which I have published I
will not change the meanings of any published code point allocations. If,
however, you wish to suggest some additional facility I will try to encode
what you suggest if that is possible.

>Stability of the meaning of the file isn't important.

Well, I disagree on that. Stability of meaning is very important. For
example, my published code point collections in the Private Use Area have
stability of meaning as a key feature. People who choose to use them can be
sure that I will not change the meanings of code point allocations which I
have published.

>
>However, I think my first suggestion will be to make the benefits of
>.uof available to XML. We can all this .uo1.

I have no idea how to do this, so if you could please explain I would be
interested to learn.

>
>I am a little disconcerted that html already can do everything that .uof
>does plus more, and is also supported by all of the publishers that are
>like to support .uof.

Well, HTML can do more than can .uof by a long way, yet HTML cannot quite do
everything which a .uof file could do, as far as I aware, though I am
willing to learn if the situation is different.

For example, suppose that a book is being made available as a Unicode plain
text file and it is desired to add just a few illustrations without a major
reformatting of the whole text, which uses CARRIAGE RETURNS to indicate
paragraphs. A text editor could be used to insert a few U+FFFC characters
at appropriate places in the file and a .uof file could be used to carry a
list of the names of the illustration files in the order in which they are
used. Conversion to HTML format would require a larger file and would limit
the ways in which the file could be displayed to just the use an HTML
browser.

> Also, as there are more than a million characters
>in Unicode, most are unused so far, so changing the meaning of just FFFC
>in this one context doesn't seem like a big win, considering also every
>line of code that might work with FFFC now needs to consider the context
>to determine its semantics.

I don't follow what you mean.

However, the meaning of U+FFFC is not, I hope, going to be changed at all.
I have simply suggested that an optional way of indicating, outside of the
plain text file which contains one or more U+FFFC characters, the extra
information as to which object the U+FFFC character is anchoring.

>But every invention deserves to be implemented, we need not look at
>whether the invention satisfies some demand of its customers.

I disagree with this. My view is that not every invention deserves to be
implemented, and indeed that not every invention needs to be considered as
consideration takes time and may cost money. However, I do feel strongly,
and have for many years, that when an invention is considered it should be
considered on its merits and without prejudice, such as, for example, when
an invention is turned down because of "not representing an organisation"
discrimination because the invention has been suggested by someone who is
not representing a company. As to customer needs, certainly an invention
that satisfies an existing need meets that criterion, yet it is also the
case that sometimes the need does not exist until potential customers become
aware of what has become possible and then begin to have a need, or desire,
for it.

>I like the 2 birds picture and I assume it was a metaphor for the idea-
>one bird was html the other unicode. I was a little disappointed that
>you used html instead of .uof format though.

The picture of the birds has been in our family webspace since 1998 as an
illustration for the saying "Painting two birds on one canvas". That
saying, originated by me, is a peaceful saying meaning to achieve two
results by one activity. I made the picture from clip art as a learning
exercise.

The picture of the birds is referenced as a way of illustrating the saying
"Painting two birds on one canvas". It is not the picture in the story
about which Ken asked.

I am interested in creative writing, so when Ken asked about the story, I
just thought of something to put in my response. Part of the training in,
and the fun of, creative writing is to be able to write something promptly
to a topic.

The two birds are not a metaphor for HTML and Unicode at all. Ken put two
illustrations in his posting so I put one in mine. It all adds to the
interest for readers.

William Overington

16 August 2002



This archive was generated by hypermail 2.1.2 : Fri Aug 16 2002 - 10:11:42 EDT