Re: Can a single text document use multiple character encodings?

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Thu, 29 Aug 2013 00:08:15 +0100

On Wed, 28 Aug 2013 13:00:45 -0700
Stephan Stiller <stephan.stiller_at_gmail.com> wrote:

> People will occasionally mention ISO/IEC 2022, which can be thought
> of as a meta-encoding or encoding template or encoding constructor,
> but in the normal case a sensible position is that a document making
> use of multiple encodings is no longer plaintext ("single text
> document" in this thread's subject line). And – yes – "plaintext" is
> a fuzzy notion around the edges, as others have successfully argued
> in the past.

I see no reason not to see a text file encoded using ISO/IEC 2022 as
plain text. A good many emacs lisp files are in an ISO/IEC 2022
encoding, and they are as plain text as program code can be.

However, formally, any such document is in a single, highly-stateful
encoding - just like a document in the SCSU encoding. Just
to complicate matters, most documents encoded using ISO/IEC 2022 rely on
default initial settings, and so to interpret them it is not enough to
say it is in an ISO/IEC 2022 encoding, but instead one must specify the
particular encoding, which then defines the initial states.

An e-mail as transmitted - and concatenations thereof - comes very close
to being a document in multiple character encodings. Whether it is
plain text depends on one's view point - would one see it containing
binary attachments or as containing textual representations of binary
files?

Richard.
Received on Wed Aug 28 2013 - 18:10:12 CDT

This archive was generated by hypermail 2.2.0 : Wed Aug 28 2013 - 18:10:12 CDT