Re: Plain text

From: Asmus Freytag (
Date: Thu Jul 29 2010 - 00:47:13 CDT

  • Next message: Khaled Hosny: "Re: High dot/dot above punctuation?"

    On 7/28/2010 9:32 PM, Doug Ewell wrote:
    > Murray Sargent <murrays at exchange dot microsoft dot com> wrote:
    >> It's worth remembering that plain text is a format that was
    >> introduced due to the limitations of early computers. Books have
    >> always been rendered with at least some degree of rich text. And due
    >> to the complexity of Unicode, even Unicode plain text often needs to
    >> be rendered with more than one font.
    > I disagree with this assessment of plain text. When you consider the
    > basic equivalence of the "same" text written in longhand by different
    > people, typed on a typewriter, finger-painted by a child,
    > spray-painted through a stencil, etc., it's clear that the "sameness"
    > is an attribute of the underlying plain text. None of these examples
    > has anything to do with computers, old or new.
    That may be, but the way Unicode plain text is designed, is based on the
    concept of plain text in computers, and what that means was hashed out
    long before Unicode arrived on the scene. To a large measure, what
    Unicode did, was extend that concept to additional writing systems (and
    to historic or rarely used nooks and crannies of some of the existing
    writing systems).

    In the process, your definition of plain text was pulled out, dusted
    off, and used as a philosophical underpinning of the enterprise - but
    the technologists in the effort did not first discard any notions of
    computer-based plain text before proceeding. In other words, claiming a
    clean break between the existing "ASCII" plain text and Unicode would be
    a falsification.
    > I do agree that rich text has existed for a long time, possibly as
    > long as plain text (though I doubt that, when you consider really
    > early writing technologies like palm leaves), but I don't think that
    > refutes the independent existence of plain text. And I don't think
    > the need to use more than one font to render some Unicode text implies
    > it isn't plain text. I think that has more to do with aesthetics (a
    > rich-text concept) and technical limits on font size.
    No, it's not headings and the like. If you pull together a selection of
    ordinary books in the English language and remove rich text attributes,
    you will find a considerable fraction of the works will exhibit subtle
    changes in meaning - these works require italics to mark emphasis in
    places where the same sequence of words can be read in different ways.

    Scholarly works require italics for citations - absent italics, some
    other method would need to be introduced to mark titles, without any
    designation, there can and will be ambiguities.

    Hence, not all texts can be expressed as plain text.

    If you take a German text, rendered (by a human typesetter) in Fraktur
    and rendered (by a later typesetter) in Antiqua, you will find that the
    second version has less information in it, when you encode both texts on
    a computer. And many texts that can be represented as plain text if they
    are to be rendered in Antiqua cannot be plain text if they are to be
    rendered according to the rules of typesetting a work in the Fraktur
    style - again, we are talking ordinary running text, no headings,
    bibliographies or anything.

    The additional information is not of an aesthetic or stylistic nature,
    but tied to the meaning of certain words - that which Unicode calls
     In other words, the text, as rendered in Antiqua, allows for potential
    ambiguities - not necessarily fatal ones, because context may easily
    resolve them, but they are there, nevertheless.

    This is just one example how the concept of an abstract content of a
    piece of text is not nearly as clearcut as you might think.

    On the contrary, the definition of Unicode plain text is straight
    forward: a sequence of Unicode characters without any style information.


    This archive was generated by hypermail 2.1.5 : Thu Jul 29 2010 - 00:50:06 CDT