Re: terminology: plaintext (was Re: unicode Digest V5 #149)

From: Asmus Freytag (
Date: Fri Jun 24 2005 - 23:11:44 CDT

  • Next message: James Kass: "Re: terminology: plaintext (was Re: unicode Digest V5 #149)"

    The HTML source file does not contain instructions on how to style it.
    Therefore, the source file as such, just as a cpp file, is definitely plain

    Tools that only know about plain text can perform meaningful operations on
    HTML source,
    from making it visible to a programmer to printing it (again as source).

    In the late 80's Microsoft had a format for doing online help that was
    using a rich
    text (in fact hypertext) hosted on top of another rich text format.

    The help source files were special word files, in which certain character
    took on special meanings. I forgot the details, but essentially, imagine for
    example, using the footnote style to designate links.

    For the editor (Word), the files were regular word files (although to a reader
    the footnotes looked awfully funny). Certain styles, like italic and bold,
    be displayed by Word just the same way as in the help system.

    To the help compiler, the funny footnotes had special meaning - therefore these
    help files were a 'representation of rich text expressed in a rich text

    You all know that word files themselves can be serialized in a plain text
    called RTF. So in that system you had three levels, the bottom-most was a plain
    text source file (RTF) that any text editor could view and modify.

    As long as the modifications were legal RTF, Word could convert these files to
    its own proprietary format (DOC), without knowing about their use in the help
    system. It would display the files and allow editing.

    As long as your modifications conformed to the additional restrictions and
    conventions impose by the help system, the help compiler could convert these
    to the proprietary format used by the online help viewer.

    Because of real-life examples like that, I find it way more helpful to consider
    the HTML source files as such plain text.

    Syntax coloring really doesn't change that.

    Here's a simple example: If I write a 'syntax coloring' program for English,
    your definition would make all English texts no longer be 'plain text'.

    At that point, plain text ceases to be a useful distinction. Therefore,
    I prefer to use the term in a way that signifies something that I find
    meaningful, to wit, as a file format, it is identical to what a plain
    text editor expects. By extension, and that is how it is often used,
    plain text describes those features of a text that you can make visible
    with such a tool.

    Once you are able to agree to that, you can go a step further and label
    any stretch of data that could be copied as is into a plain text file,
    as a plain text run. (Or if in memory, a plain text buffer).

    Now you get to an interesting point: HTML source files are plain text.
    The data in the HTML source file are a representation of rich text,
    and the HTML parse tree consists out of a combination of style nodes
    and plain text buffers.

    I find that picture very clear and not confusing at all.


    At 02:58 PM 6/24/2005, François Yergeau wrote:
    >Asmus Freytag a écrit :
    >>HTML is a representation of rich text expressed in a plain text format.
    >I think this continues the confusion. .....

    This archive was generated by hypermail 2.1.5 : Fri Jun 24 2005 - 23:13:44 CDT