Re: Can a single text document use multiple character encodings?

From: Asmus Freytag <asmusf_at_ix.netcom.com>
Date: Wed, 28 Aug 2013 15:35:24 -0700

On 8/28/2013 1:00 PM, Stephan Stiller wrote:
>> For Web formats (HTML, etc.), the answer is "no".
> The obvious follow-up to the list: It'd be interesting to know where
> the answer is "yes".
>
> People will occasionally mention ISO/IEC 2022, which can be thought of
> as a meta-encoding or encoding template or encoding constructor, but
> in the normal case a sensible position is that a document making use
> of multiple encodings is no longer plaintext ("single text document"
> in this thread's subject line). And – yes – "plaintext" is a fuzzy
> notion around the edges, as others have successfully argued in the past.
>
The original question was about combining UTF-8 and UTF-16 in the same
document. As plain text is usually represented as an array of code
units, the choice of two different code unit sizes 8 and 16-bit is
particularly unlikely to be supported - with potential exception of some
richly structured data formats which (if they existed) might handle such
situations. None are generally known that fit the "single text document"
qualification.

ISO 2022 allows switching among sets in mid stream, but as far as I
remember (haven't had to think about this since Unicode came around) the
code unit is still a byte, except that sometimes pairs of bytes are
used. As I remember, ISO 2022 was still far from widely supported in the
late 80's and practically not at all on the fast growing PC sector.

The reason for that, the Unicode advocates think, is that it's just too
unwieldy.

As for mixing UTF-8 and UTF-16, the conversion is lossless and so
trivial that most people would just convert the data into one or the
other of these formats and not bother to have both. So in the unlikely
case that a format existed for a "single document" where you could to
that, it would seem even less likely that it was used for the example
given in the question.

A./
Received on Wed Aug 28 2013 - 19:21:58 CDT

This archive was generated by hypermail 2.2.0 : Wed Aug 28 2013 - 19:21:58 CDT