Re: UTF-8 code in HTML

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Apr 11 2000 - 19:35:57 EDT


Jonathan Coxhead wrote:
> Don't we need some conventional file extensions for both plain
> text and H T M L encoded in U T F 8, U T F 16, etc? E g
>
> ".utf" => text/plain; charset = utf-8
> ".uni" => text/plain; charset = utf-16
> ".utfml" => text/html; charset = utf-8
> ".uniml" => text/html; charset = utf-16

It is not feasible to have a different extension per encoding, and it is - luckily - not necessary with HTML and XML pages since they are self-describing. You should provide your HTTP server with the information about your pages that you have it serve up. This information would include charset, language, and maybe more.
If you don't provide this information, then the browser can still get it out of the HTML page's <meta> tags.

By the way, the default charset for HTML is ISO8859-1, not US-ASCII, I hope.

markus



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT