Re: terminology: plaintext (was Re: unicode Digest V5 #149)

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Jun 24 2005 - 23:11:44 CDT

Next message: James Kass: "Re: terminology: plaintext (was Re: unicode Digest V5 #149)"

Previous message: Richard Wordingham: "Re: Tamil Collation vs Transliteration/Transcription Enc"
In reply to: François Yergeau: "Re: terminology: plaintext (was Re: unicode Digest V5 #149)"
Next in thread: Sinnathurai Srivas: "Re: terminology: plaintext (was Re: unicode Digest V5 #149)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

The HTML source file does not contain instructions on how to style it.
Therefore, the source file as such, just as a cpp file, is definitely plain
text.

Tools that only know about plain text can perform meaningful operations on
HTML source,
from making it visible to a programmer to printing it (again as source).

In the late 80's Microsoft had a format for doing online help that was
using a rich
text (in fact hypertext) hosted on top of another rich text format.

The help source files were special word files, in which certain character
styles
took on special meanings. I forgot the details, but essentially, imagine for
example, using the footnote style to designate links.

For the editor (Word), the files were regular word files (although to a reader
the footnotes looked awfully funny). Certain styles, like italic and bold,
would
be displayed by Word just the same way as in the help system.

To the help compiler, the funny footnotes had special meaning - therefore these
help files were a 'representation of rich text expressed in a rich text
format'.

You all know that word files themselves can be serialized in a plain text
format,
called RTF. So in that system you had three levels, the bottom-most was a plain
text source file (RTF) that any text editor could view and modify.

As long as the modifications were legal RTF, Word could convert these files to
its own proprietary format (DOC), without knowing about their use in the help
system. It would display the files and allow editing.

As long as your modifications conformed to the additional restrictions and
conventions impose by the help system, the help compiler could convert these
to the proprietary format used by the online help viewer.

Because of real-life examples like that, I find it way more helpful to consider
the HTML source files as such plain text.

Syntax coloring really doesn't change that.

Here's a simple example: If I write a 'syntax coloring' program for English,
your definition would make all English texts no longer be 'plain text'.

At that point, plain text ceases to be a useful distinction. Therefore,
I prefer to use the term in a way that signifies something that I find
meaningful, to wit, as a file format, it is identical to what a plain
text editor expects. By extension, and that is how it is often used,
plain text describes those features of a text that you can make visible
with such a tool.

Once you are able to agree to that, you can go a step further and label
any stretch of data that could be copied as is into a plain text file,
as a plain text run. (Or if in memory, a plain text buffer).

Now you get to an interesting point: HTML source files are plain text.
The data in the HTML source file are a representation of rich text,
and the HTML parse tree consists out of a combination of style nodes
and plain text buffers.

I find that picture very clear and not confusing at all.

A./

At 02:58 PM 6/24/2005, François Yergeau wrote:
>Asmus Freytag a écrit :
>>HTML is a representation of rich text expressed in a plain text format.
>
>I think this continues the confusion. .....

Next message: James Kass: "Re: terminology: plaintext (was Re: unicode Digest V5 #149)"
Previous message: Richard Wordingham: "Re: Tamil Collation vs Transliteration/Transcription Enc"
In reply to: François Yergeau: "Re: terminology: plaintext (was Re: unicode Digest V5 #149)"
Next in thread: Sinnathurai Srivas: "Re: terminology: plaintext (was Re: unicode Digest V5 #149)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jun 24 2005 - 23:13:44 CDT