Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

From: Tex Texin (tex@i18nguy.com)
Date: Fri Aug 16 2002 - 03:32:24 EDT


William,

So let me see if I understand this correctly.

Let's take 2 perfectly good standards, Unicode and HTML, and make some
very minor tweaks to them, such as changing the meaning of U+FFFC and a
special format for filenames in the beginning of the file and a new
extension, so we have something new.

Now the big benefit of this completely new thing, is that programs that
do desktop publishing can use plain text files which are not quite plain
text because they have some special formatting, but now they can publish
them in better manner than before. For example, plain text with
pictures. This is great. (It is true that it is less capable than if we
had just used enough html to do the same thing, but .uof is more like
plain text than html is.) Programmers will be happy because now they can
support plain text with just a few tweaks. Oh I almost forgot, they also
have to support Unicode, but slightly tweaked. And they can also support
HTML, with some minor tweaks for .uof. Of course programmers don't mind
supporting lots of variations of the same thing. Customer support
personnel also don't mind.
Oh, the plain text programmers will now need to support pictures and
other aspects of full publishing, but at least they won't have a complex
file format to work with. I guess it doesn't matter that a more complex
format is also more expressive and therefore can leverage all of the
publishing features. It probably doesn't matter that a desktop
publishing product probably already supports more complex formats, and
probably also supports html, it will be beneficial to add this slight
difference from plain text.

I like this very much. It is very much like when the magician slides the
knot in the string and makes it disappear.

I imagine that over time we will have some more wonderful inventions and
add further tweaks and further improve the publishing of plain text.

There are a few other things I would like to improve in Unicode, so I
hope it will be ok to make some other suggestions. We can change the
extention to know which tweaks we are talking about. .uo1, .uo2. Just a
few small changes to characters and plain text format variations.
Stability of the meaning of the file isn't important.

However, I think my first suggestion will be to make the benefits of
.uof available to XML. We can all this .uo1.

I am a little disconcerted that html already can do everything that .uof
does plus more, and is also supported by all of the publishers that are
like to support .uof. Also, as there are more than a million characters
in Unicode, most are unused so far, so changing the meaning of just FFFC
in this one context doesn't seem like a big win, considering also every
line of code that might work with FFFC now needs to consider the context
to determine its semantics.
But every invention deserves to be implemented, we need not look at
whether the invention satisfies some demand of its customers.

I like the 2 birds picture and I assume it was a metaphor for the idea-
one bird was html the other unicode. I was a little disappointed that
you used html instead of .uof format though.

Maybe its the lateness of the hour here. I hope the idea looks as good
in the morning.

Oh I almost forgot. I was having difficulty discerning when you and Ken
might be joking. The mails read very serious. I would like to suggest we
make a new format .uo2. We can indicate line numbers and emotions with
plain text characters that look like facial expressions. It would help
me know when you both were serious and when you might be joking.
Sometimes it is hard to tell. I am going to create a list of facial
expressions and assign them in the PUA so we can all have a standard to
follow. See my next mail with a list of facial expressions and
assignments.
tex

William Overington wrote:
>
> Kenneth Whistler wrote as follows about my idea.
>
> >> It occurs to me that it is possible to introduce a convention, either as
> a
> >> matter included in the Unicode specification, or as just a known about
> >> thing, that if one has a plain text Unicode file with a file name that
> has
> >> some particular extension (any ideas for something like .uof for Unicode
> >> object file)
> >
> >...or to pick an extension, more or less at random, say ".html"
>
> Well, that could produce confusion with a .html file used for Hyper Text
> Markup Language, HTML.
>
> I suggested .uof so that a .uof file would be known as being for this
> purpose.
>
> >
> >> that accompanies another plain text Unicode file which has a
> >> file name extension such as .txt, or indeed other choices except .uof (or
> >> whatever is chosen after discussion) then the convention could be that
> the
> >> .uof file has on lines of text, in order, the name of the text file then
> the
> >> names of the files which contains each object to which a U+FFFC character
> >> provides the anchor.
> >>
> >> For example, a file with a name such as story7.uof might have the
> following
> >> lines of text as its contents.
> >>
> >> story7.txt
> >> horse.gif
> >> dog.gif
> >> painting.jpg
> >
> >This is a shaggy dog story, right?
>
> No, it is a story about an artist who wanted to paint a picture of a horse
> and a picture of a dog and, since he knew that the horse and the dog were
> great friends and liked to be together and also that he only had one canvas
> upon which to paint, the artist painted a picture of a landscape with the
> horse and the dog in the foreground, thereby, as the saying goes, painting
> two birds on one canvas, http://www.users.globalnet.co.uk/~ngo/bird0001.htm
> in that he achieved two results by one activity. In addition the picture
> has various interesting details in the background, such as a windmill in a
> plain (or is that a windmill in a plain text file). :-)
>
> >> The file story7.uof could thus be used with a file named story.txt so as
> to
> >> indicate which objects were intended to be used for three uses of U+FFFC
> in
> >> the file story7.txt, in the order in which they are to be used.
> >
> >Or we could go even further, and specify that in the story7.html file,
> >the three uses of those objects could be introduced with a very specific
> >syntax that would not only indicate the order that they occur in, but
> >could indicate the *exact* location one could obtain the objects -- either
> on
> >one's own machine or even anywhere around the world via the Internet! And
> we could
> >even include a mechanism for specifying the exact size that the object
> should be
> >displayed. For example, we could use something like:
> >
> ><img src="http://www.coteindustries.com/dogs/images/dogs4.jpg" width="380"
> > height="260" border="1">
> >
> >or
> >
> ><img src="http://www.artofeurope.com/velasquez/vel2.jpg">
>
> Now that is a good idea. In a .uof file specifically for the purpose, a
> line beginning with a < character could be used to indicate a web based
> reference, or a local reference, for the object, using exactly the same
> format as is used in an HTML file.
>
> If the line does not start with a < character, then it is simply a file name
> in the same directory as the .uof file, as I suggested originally. This
> would mean that where, say, a .uof file were broadcast upon a telesoftware
> service that the Java program (also broadcast) analysing the file names in
> the .uof file need not necessarily be able to decode lines starting with a <
> character so that the Java program does not need to have the software for
> that decoding in it, yet the same .uof file specification could be used,
> both in a telesoftware service and on the web, where a more comprehensive
> method of referencing objects were needed.
>
> >> I can imagine that such a widely used practice might be helpful in
> bridging
> >> the gap between being able to use a plain text file or maybe having to
> use
> >> some expensive wordprocessing package.
> >
> >And maybe someone will write cheaper software -- we could call it a
> "browser" --
> >that could even be distributed for free, so that people could make use of
> >this convention for viewing objects correctly distributed with respect to
> >the text they are embedded in.
>
> Indeed, except not call it a browser as the name is already in widespread
> use for HTML browsers and might cause confusion. Analysing a .uof file
> would be a much less computational task than analysing the complete syntax
> of HTML files.
>
> >Yes, yes, I think this is an idea which could fly.
> >
> >--Ken
> >
>
> Good. It is a solution which could be very useful for people writing
> programs in Java, Pascal and C and so on which programs take in plain text
> files and process them for such purposes as producing a desktop publishing
> package.
>
> Hopefully the Unicode Technical Committee will be pleased to add a .uof
> format file specification into the set of Unicode documents so that the
> U+FFFC code can be used in an effective manner. The idea could be that if a
> .uof file is processed then the rules of .uof files apply in that situation,
> so that if a .uof file is not being processed, then the rules for .uof files
> do not apply, therefore there is no question that the publication of a .uof
> file specification by the Unicode Consortium would prejudice the rights of
> anyone to use the U+FFFC character in any other manner.
>
> Publication of such a .uof file specification would also prevent U+FFFC
> being made into a noncharacter and keep the facility of using the U+FFFC
> character in interchanged documents available for all, whether they choose
> to use the .uof file format or some other format for explaining the meaning
> of any U+FFFC codes in a given document.
>
> Could this be discussed at the Unicode Technical Committee meeting next week
> please?
>
> William Overington
>
> 16 August 2002

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------



This archive was generated by hypermail 2.1.2 : Fri Aug 16 2002 - 01:31:54 EDT