Re: Can a single text document use multiple character encodings?

From: Andrew Cunningham <lang.support_at_gmail.com>
Date: Sat, 31 Aug 2013 08:42:48 +1000

I can think of a few websites that mix legacy encoded content withina utf-8
document.

Often done as a practicality.

Or alternatively mixing Unicode and pseudo-Unicode in same document.

Andrew
On 30/08/2013 11:14 PM, "Ilya Zakharevich" <nospam-abuse_at_ilyaz.org> wrote:

> On Wed, Aug 28, 2013 at 07:07:23PM +0000, Costello, Roger L. wrote:
>
> > For example, can some text be encoded as UTF-8 while other text is
> encoded as UTF-16 - within the same document?
>
> I think it is a very interesting question. A Perl program is
> (obviously) a text document. On the other hand, in two minutes I
> could deduce a few ways to mix many different encodings into the same
> document. My current record is 5 different encodings; some of them
> are arbitrary, some of them should satisfy certain compatibility
> requirements (something like
> =cut CR
> and
> =pod CR
> being encoded the same in two encodings). And, on top of this, is yet
> another way to mix encodings arbitrarily.
>
> The tricks are threefold:
>
> ◌ First, a Perl program is actually a mixture of 3 different
> documents: the program stream, the data-for-the-program stream,
> and the documentation stream. There are certain rules for
> interleaving them (except for DATA which should be at the end!),
> and there are documented way to specify encodings of the
> streams.
>
> ◌ Second, the string and regular-expression literals are
> “interpreted” by the lexer: there is a way for the program to
> specify a way to “massage” the literals before they are handled
> to interpreter. This gives yet other ways to have strings
> and/or regular expressions to be in a different encoding. (Note
> that this may lead to “doubly encoded” phenomena if the
> “ambient” encoding is not “raw”.)
>
> ◌ Third, there is a way to switch the encoding of a Perl program
> on the fly (at the end-of-line of current encoding).
>
> To be honest, I should have better tested all this before
> posting — but I did not. On the practical side, how is this useful?
> Having different encoding for DATA and the program, and/or
> documentation and the program may be quite widely used. The other
> hacks may have been used at least in the (enormous!) Perl test suite.
>
> Ilya
>
>
Received on Fri Aug 30 2013 - 17:44:32 CDT

This archive was generated by hypermail 2.2.0 : Fri Aug 30 2013 - 17:44:32 CDT