From: Asmus Freytag (email@example.com)
Date: Thu Jul 29 2010 - 00:47:13 CDT
On 7/28/2010 9:32 PM, Doug Ewell wrote:
> Murray Sargent <murrays at exchange dot microsoft dot com> wrote:
>> It's worth remembering that plain text is a format that was
>> introduced due to the limitations of early computers. Books have
>> always been rendered with at least some degree of rich text. And due
>> to the complexity of Unicode, even Unicode plain text often needs to
>> be rendered with more than one font.
> I disagree with this assessment of plain text. When you consider the
> basic equivalence of the "same" text written in longhand by different
> people, typed on a typewriter, finger-painted by a child,
> spray-painted through a stencil, etc., it's clear that the "sameness"
> is an attribute of the underlying plain text. None of these examples
> has anything to do with computers, old or new.
That may be, but the way Unicode plain text is designed, is based on the
concept of plain text in computers, and what that means was hashed out
long before Unicode arrived on the scene. To a large measure, what
Unicode did, was extend that concept to additional writing systems (and
to historic or rarely used nooks and crannies of some of the existing
In the process, your definition of plain text was pulled out, dusted
off, and used as a philosophical underpinning of the enterprise - but
the technologists in the effort did not first discard any notions of
computer-based plain text before proceeding. In other words, claiming a
clean break between the existing "ASCII" plain text and Unicode would be
> I do agree that rich text has existed for a long time, possibly as
> long as plain text (though I doubt that, when you consider really
> early writing technologies like palm leaves), but I don't think that
> refutes the independent existence of plain text. And I don't think
> the need to use more than one font to render some Unicode text implies
> it isn't plain text. I think that has more to do with aesthetics (a
> rich-text concept) and technical limits on font size.
No, it's not headings and the like. If you pull together a selection of
ordinary books in the English language and remove rich text attributes,
you will find a considerable fraction of the works will exhibit subtle
changes in meaning - these works require italics to mark emphasis in
places where the same sequence of words can be read in different ways.
Scholarly works require italics for citations - absent italics, some
other method would need to be introduced to mark titles, without any
designation, there can and will be ambiguities.
Hence, not all texts can be expressed as plain text.
If you take a German text, rendered (by a human typesetter) in Fraktur
and rendered (by a later typesetter) in Antiqua, you will find that the
second version has less information in it, when you encode both texts on
a computer. And many texts that can be represented as plain text if they
are to be rendered in Antiqua cannot be plain text if they are to be
rendered according to the rules of typesetting a work in the Fraktur
style - again, we are talking ordinary running text, no headings,
bibliographies or anything.
The additional information is not of an aesthetic or stylistic nature,
but tied to the meaning of certain words - that which Unicode calls
In other words, the text, as rendered in Antiqua, allows for potential
ambiguities - not necessarily fatal ones, because context may easily
resolve them, but they are there, nevertheless.
This is just one example how the concept of an abstract content of a
piece of text is not nearly as clearcut as you might think.
On the contrary, the definition of Unicode plain text is straight
forward: a sequence of Unicode characters without any style information.
This archive was generated by hypermail 2.1.5 : Thu Jul 29 2010 - 00:50:06 CDT