Re: UTF-8 signature in web and email

From: Martin Duerst (duerst@w3.org)
Date: Tue May 15 2001 - 21:55:37 EDT

Next message: Marco Cimarosti: "RE: Ancient writing found in Turkmenistan"
Previous message: Roozbeh Pournader: "BIDI: possible fix"
Maybe in reply to: Roozbeh Pournader: "UTF-8 signature in web and email"
Next in thread: Michael \(michka\) Kaplan: "Re: UTF-8 signature in web and email"
Reply: Michael \(michka\) Kaplan: "Re: UTF-8 signature in web and email"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hello Roozbeh

At 04:02 01/05/15 +0430, Roozbeh Pournader wrote:

>Well, I received a UTF-8 email from Microsoft's Dr International today. It
>was a "multipart/alternative", with both the "text/plain" and "text/html"
>in UTF-8. Well, nothing interesting yet, but the interesting point was
>that the HTML version had a UTF-8 signature, but the text version lacked
>it. So, the HTML version had it three times: mime charset as UTF-8,
>UTF-8 signature, and <meta> charset markup.

This is definitely overblown. There is about 5% of a justification
for having a 'signature' on a plain-text, standalone file (the reason
being that it's somewhat easier to detect that the file is UTF-8 from the
signature than to read through the file and check the byte patterns
(which is an extremely good method to distinguish UTF-8 from everything
else)). For self-labeled data (HTML, XML, CSS) and in the context
of MIME (with the charset parameter), an UTF-8 signature doesn't
make sense at all.

>Questions:
>
>1. What are the current recommendations for these?

- When producing UTF-8 files/documents, *never* produce a 'signature'.
There are quite some receivers that cannot deal with it, or that deal
with it by displaying something. And there are many other problems.

- When receiving UTF-8, you probably should check for a 'signature'
and remove it. There are too many applications that send one out,
unfortunately.

>2. Most important of all, does W3C allow UTF-8 signatures before
>"<!DOCTYPE>"? And if yes, what should be done if they mismatch the
>charset as can be described in the <meta> tag?

For text/html, neither the HTML spec nor the IETF definition of UTF-8
(RFC 2279) says anything as far as I know. The reason was that nobody
thought about an UTF-8 signature at that time.

For XML, the 'signature' is now listed in App F.1
http://www.w3.org/TR/REC-xml#sec-guessing-no-ext-info
But this is not normative, and fairly recent, and so you should never
expect an XML processor to accept it (except as a plain character
in the file when there is no XML declaration).

Regards, Martin.

Next message: Marco Cimarosti: "RE: Ancient writing found in Turkmenistan"
Previous message: Roozbeh Pournader: "BIDI: possible fix"
Maybe in reply to: Roozbeh Pournader: "UTF-8 signature in web and email"
Next in thread: Michael \(michka\) Kaplan: "Re: UTF-8 signature in web and email"
Reply: Michael \(michka\) Kaplan: "Re: UTF-8 signature in web and email"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT