Re: definition of plain text

From: Ken Whistler <>
Date: Fri, 14 Oct 2011 11:01:55 -0700

On 10/13/2011 10:49 PM, Peter Cyrus wrote:
> Is there a definition or guideline for the distinction between plain
> text and rich text?

I think where you may be getting hung up is trying to define plain
text versus rich text in terms of the content and/or appearance of
the text (i.e. the outcome), instead of the storage and processing
of the text.

The following string is plain text:


It contains 13 characters. We can find all their Unicode code points and
stuff them in a 13-element array of 16-bit Unicode code units, for

The following string is also plain text:


It contains 2 characters. We can find all their Unicode code points and
stuff them in a 2-element array of 16-bit Unicode code units.

The first string can, however, also be interpreted (and displayed) as
rich text, by
applying a higher-level protocol (in this case, HTML), which interprets
certain sequences of characters as markup and others as content.
In that case, the first string would be interpreted
differently, and would be displayed as "3^2 ".

The end results may end up "meaning" the same thing, and might even
display identically (although in this case most renderers will show them
a little differently), although one is plain text and the other is rich

If you stick to the notion that Unicode plain text consists of a sequence
of Unicode characters (in some encoding form) stuffed into an array
of code units, regardless of what format controls are included and what
they may "mean", you don't go far wrong. Rich text starts where you
either start interpreting some subset of the characters in the array
as markup according to a higher-level protocol, or you start applying
out-of-band information not actually stored in the text array but affecting
its formatting (or other aspects of its processing).

UTR #20 is replete with examples of the borderline between plain
text and rich text, where you could, in principle, get the same outcome
either using Unicode plain text, or by using Unicode plain
text for the core content and applying markup. It isn't the outcome per
se that makes the difference -- it is how you represent the text and
process it to get there.

Received on Fri Oct 14 2011 - 13:07:40 CDT

This archive was generated by hypermail 2.2.0 : Fri Oct 14 2011 - 13:07:46 CDT