Re: Terminology question re ASCII from Jukka K. Korpela on 2013-10-29 (Unicode Mail List Archive)

From: Jukka K. Korpela <jkorpela_at_cs.tut.fi>
Date: Tue, 29 Oct 2013 08:19:09 +0200

2013-10-29 6:12, dzo_at_bisharat.net wrote:

> If one refers to "plain ASCII," or "plain ASCII text" or "...
> characters," should this be taken strictly as referring to the 7-bit
> basic characters, or might it encompass characters that might appear
> in an 8-bit character set (per the so-called "extended ASCII")?

In correct usage, “ASCII” refers to a specific standard, namely
“American National Standard for Information Systems -
Coded Character Sets - 7-Bit American National Standard Code
for Information Interchange (7-Bit ASCII)”, ANSI X3.4-1986, except
in historical presentations, where it might refer to predecessors of
that standard (earlier versions of ASCII).

In common usage, “ASCII” is also used to denote a) text data in general,
b) some 8-bit encoding that has ASCII characters as its 7-bit subset,
and c) other things. This can be very confusing, and that’s why the
standard has the parenthetic note “7-Bit ASCII” and why people often use
“US-ASCII” as the name of the ASCII encoding. The clarifying prefixes
are, however, also misleading in the sense that they suggests the
existence of other ASCIIs.

> I've always used the term "ASCII" in the 7-bit, 128 character sense,
> and modifying it with "plain" seems to reinforce that sense.
> (Although "plain text" in my understanding actually refers to lack of
> formatting.)

The attribute “plain” probably refers to plain text in the contexts
given. Once people make the mistake of writing “ASCII” when they mean
“text”, further confusion will be caused by attributes like “plain”,
which are indeed ambiguous.

> Reason for asking is encountering a reference to "plain ASCII"
> describing text that clearly (by presence of accented characters)
> would be 8-bit.

It probably means “plain text”. But it could also mean “text in an 8-bit
encoding”, if the author thinks of encodings like ISO 8859-1,
windows-1252, ISO 8859-2, cp-850, Mac Roman, etc., as “extended ASCII”
and even drops the attribute “extended”. It is conceivable that “plain
ASCII” is even used to emphasize that the text is not in a Unicode encoding.

> The context is one of many situations where in attaching a document
> to an email, it is advisable to include an unformatted text version
> of the document in the body of the email. Never mind that the latter
> is probably in UTF-8 anyway(?) - the issue here is the terminology.

The proper term for plain text is “plain text”. The word “unformatted”
is often used, and might be seen as intuitively descriptive
(unformatted, as opposite to text that contains formatting like bolding,
colors, and different fonts), but it is risky. For one thing, plain text
is often displayed “as is” with respect to line breaks and indentation,
i.e. as “preformatted” (as in <pre> elements in HTML). Moreover, text
that is not plain text need not be formatted. It could be e.g. an XML
file where XML tags are used to mark up structural parts of the text,
without causing or implying any specific formatting in rendering.

Yucca
Received on Tue Oct 29 2013 - 01:20:56 CDT

This archive was generated by hypermail 2.2.0 : Tue Oct 29 2013 - 01:20:56 CDT