Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

From: William Overington (
Date: Fri Aug 16 2002 - 02:07:24 EDT

Kenneth Whistler wrote as follows about my idea.

>> It occurs to me that it is possible to introduce a convention, either as
>> matter included in the Unicode specification, or as just a known about
>> thing, that if one has a plain text Unicode file with a file name that
>> some particular extension (any ideas for something like .uof for Unicode
>> object file)
>...or to pick an extension, more or less at random, say ".html"

Well, that could produce confusion with a .html file used for Hyper Text
Markup Language, HTML.

I suggested .uof so that a .uof file would be known as being for this

>> that accompanies another plain text Unicode file which has a
>> file name extension such as .txt, or indeed other choices except .uof (or
>> whatever is chosen after discussion) then the convention could be that
>> .uof file has on lines of text, in order, the name of the text file then
>> names of the files which contains each object to which a U+FFFC character
>> provides the anchor.
>> For example, a file with a name such as story7.uof might have the
>> lines of text as its contents.
>> story7.txt
>> horse.gif
>> dog.gif
>> painting.jpg
>This is a shaggy dog story, right?

No, it is a story about an artist who wanted to paint a picture of a horse
and a picture of a dog and, since he knew that the horse and the dog were
great friends and liked to be together and also that he only had one canvas
upon which to paint, the artist painted a picture of a landscape with the
horse and the dog in the foreground, thereby, as the saying goes, painting
two birds on one canvas,
in that he achieved two results by one activity. In addition the picture
has various interesting details in the background, such as a windmill in a
plain (or is that a windmill in a plain text file). :-)

>> The file story7.uof could thus be used with a file named story.txt so as
>> indicate which objects were intended to be used for three uses of U+FFFC
>> the file story7.txt, in the order in which they are to be used.
>Or we could go even further, and specify that in the story7.html file,
>the three uses of those objects could be introduced with a very specific
>syntax that would not only indicate the order that they occur in, but
>could indicate the *exact* location one could obtain the objects -- either
>one's own machine or even anywhere around the world via the Internet! And
we could
>even include a mechanism for specifying the exact size that the object
should be
>displayed. For example, we could use something like:
><img src="" width="380"
> height="260" border="1">
><img src="">

Now that is a good idea. In a .uof file specifically for the purpose, a
line beginning with a < character could be used to indicate a web based
reference, or a local reference, for the object, using exactly the same
format as is used in an HTML file.

If the line does not start with a < character, then it is simply a file name
in the same directory as the .uof file, as I suggested originally. This
would mean that where, say, a .uof file were broadcast upon a telesoftware
service that the Java program (also broadcast) analysing the file names in
the .uof file need not necessarily be able to decode lines starting with a <
character so that the Java program does not need to have the software for
that decoding in it, yet the same .uof file specification could be used,
both in a telesoftware service and on the web, where a more comprehensive
method of referencing objects were needed.

>> I can imagine that such a widely used practice might be helpful in
>> the gap between being able to use a plain text file or maybe having to
>> some expensive wordprocessing package.
>And maybe someone will write cheaper software -- we could call it a
"browser" --
>that could even be distributed for free, so that people could make use of
>this convention for viewing objects correctly distributed with respect to
>the text they are embedded in.

Indeed, except not call it a browser as the name is already in widespread
use for HTML browsers and might cause confusion. Analysing a .uof file
would be a much less computational task than analysing the complete syntax
of HTML files.

>Yes, yes, I think this is an idea which could fly.

Good. It is a solution which could be very useful for people writing
programs in Java, Pascal and C and so on which programs take in plain text
files and process them for such purposes as producing a desktop publishing

Hopefully the Unicode Technical Committee will be pleased to add a .uof
format file specification into the set of Unicode documents so that the
U+FFFC code can be used in an effective manner. The idea could be that if a
.uof file is processed then the rules of .uof files apply in that situation,
so that if a .uof file is not being processed, then the rules for .uof files
do not apply, therefore there is no question that the publication of a .uof
file specification by the Unicode Consortium would prejudice the rights of
anyone to use the U+FFFC character in any other manner.

Publication of such a .uof file specification would also prevent U+FFFC
being made into a noncharacter and keep the facility of using the U+FFFC
character in interchanged documents available for all, whether they choose
to use the .uof file format or some other format for explaining the meaning
of any U+FFFC codes in a given document.

Could this be discussed at the Unicode Technical Committee meeting next week

William Overington

16 August 2002

This archive was generated by hypermail 2.1.2 : Fri Aug 16 2002 - 00:14:26 EDT