Re: terminology: plaintext (was Re: unicode Digest V5 #149)

From: Asmus Freytag (
Date: Fri Jun 24 2005 - 16:06:14 CDT

  • Next message: François Yergeau: "Re: terminology: plaintext (was Re: unicode Digest V5 #149)"

    Gregg's analysis is not something we should follow in the Unicode Standard
    (I think. I'm having a hard time following it, actually).

    HTML is a representation of rich text expressed in a plain text format.

    When you view and edit HTML source, you are accessing it as plain text.

    However, the information described by HTML is rich text, which in
    turn consists of the stuff between the ">' and "<" to which information
    (the "<" and "> and what they enclose) has been added.

    Therefore, when presented to an HTML parser, HTML is decidedly *not* plain

    If I take any other (binary) rich text format, and dump it with a binary
    editor into a string of 2-digit ASCII hex-numbers, that does turn it into
    a plain text serialization of the information, but does not turn the
    text contained in the rich text into plain text.

    The point is that the HTML source is not the same as the HTML text,
    even though there are related (by the HTML protocol).

    Syntax coloring and content driven styles are even more of a red-herring
    in this context.

    By the way, the point is well taken that an encoding that encodes text
    formatting information on the same level as character codes does not
    represent plain text. In other words, if there was a unique character
    code for each element (<p> etc.) or attribute in HTML, that would not
    make HTML be plain text, any more than writing it in ASCII.


    This archive was generated by hypermail 2.1.5 : Fri Jun 24 2005 - 16:06:41 CDT