Re: charset in HTTP vs. HTML meta (was Re: UTF-16 and HTML META charset)

From: Gary L. Wade (garywade@desisoftsystems.com)
Date: Tue Feb 22 2000 - 12:05:45 EST


Glen

The very best solution is to have your web server or CGI application
produce the necessary header value, "Content-type", with the "charset"
value correctly indicated. For dynamically created web pages, you have
the option of putting this information in there yourself. For static
web pages, the web server needs to have the capability to produce these
headers. That may mean an upgrade to a newer version or a side-grade to
a different product.

Ideally, such a web server could have every known encoding on a
particular web site that it is serving and produce this information
correctly. Means to do this can range from the content developer
telling a server administrator to mark all files within a particular
folder as of a particular charset; changing the FTP server to recognize
charset commands for particular files and to store that in a database
for the web server to query; providing a web-based, Java-based, or
client-based application that can set such information on the web
server; or the use of an alternate file format, such as Apple's resource
fork, in addition to the normal data that stores that information off
for the web server to query (a product I'm working on uses this scheme
with a ('cset',0) resource that is the ASCII text value to return when requested).

The reason this is a good thing for web client software is that the user
does not have to do the interpretation for themselves and get it wrong,
and the web client software does not have to run any sniffing algorithms
to determine this value, which can practically slow down the user's
browsing experience to a crawl. In those cases where there is an
appropriate META tag, this allows the web client software to ignore it
and not have to search for it. If the web client software has to do
this, it can practically cause it to slow down the interpretation
process, since it then has to reinterpret the source code.

In any event, it is never a good idea to declare a wrong encoding. If
you have no encoding, at least the user can change the page's encoding
by the user interface to what they think it should be. If the wrong
encoding is declared, the user may not be given the option of changing
the encoding, making the web page with potentially good content a mass
of garbage in appearance.

In essence, use META tags always. Make sure your web server can do the
work for you, though, in the correct way (i.e., don't use a web server's
settings that can only produce "iso-8859-1" if your content is all
"shift_jis") and fix it if it cannot and you are able to do so.

Also, be aware that there are other text-types that are not text-types
but still need an encoding, such as JavaScript files (Content-type: application/javascript).

-- 
Gary L. Wade
Product Development Consultant
DesiSoft Systems             | Voice:   214-642-6883
9619 E. Valley Ranch Parkway | Fax:     972-506-7478
Suite 2125                   | E-Mail:  garywade@desisoftsystems.com
Irving, TX 75063             |

Glen Perkins wrote: > > Yes, this is a question I was discussing with Andrea Vine and some others a > few days ago: whether 'tis nobler to use HTTP headers, HTML meta tags, or > both, under various real-world circumstances. What are the rules for this in > > 1) any standard, as well as > 2) in practice in various versions of Netscape Navigator and > 3) in various versions of IE? > > Given the current state of things, what's the best approach to serving up > dynamic content in multiple languages? > > Assume you're trying to create a website with dynamically-generated pages in > lots of languages, but only one language per page. It's not necessarily easy > to tell the server, page by page, what encoding is being transmitted. Is the > safest, most reliable approach currently to use only the most common, > ASCII-based legacy encodings, use no HTTP Content-Type: text/html; > charset=foo header, but instead include the ASCII meta (http-equiv) tag on > every page? > > (The reason for this approach, by the way, is that it would both work > reliably now and prepare the way nicely for a rather gradual change from > those legacy encodings into UTF-8, which would be just another ASCII-based > encoding in this scenario. It's too early for UTF-8 for the general, > consumer web pages, but the same web server could begin serving UTF-8 behind > the firewall, where we could be more daring.) > > Would there be problems caused by leaving off the HTTP header charset > declaration and doing all the charset declarations in the HTML meta tag? > Would these problems be significant enough that some method really would > have to be found to include an HTTP header that matched the page's meta tag? > Would it actually be better to declare a wrong encoding in an HTTP header > than declare none at all, for some reason (still assuming all pages were > correctly meta tagged)? > > I'm leaving aside the question of non-ASCII-compatible encodings like > UTF-16, which obviously have different issues. If your meta tag is written > in UTF-16, somehow you're going to have to know the encoding before you can > read a meta tag, via HTTP, BOM, or some heuristic. It just doesn't seem > likely to me that any such encoding would be practical on a busy consumer > website that only serves one language per page, but has to have that page > work on a very wide range of browsers. I'm willing to put those encodings > aside for now in favor of ASCII-compatible encodings. > > What's the current "best practice"? > > Thanks, > __Glen__



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT