Re: browsers and unicode surrogates

From: James H. Cloos Jr. (cloos@jhcloos.com)
Date: Mon Apr 22 2002 - 07:19:45 EDT


>>>>> "Tex" == Tex Texin <texin@progress.com> writes:

Tex> I am surprised by the "must only be used". It seems I am not
Tex> conforming by including a meta statement in the utf-16 HTML
Tex> page. I should either remove the statement or encode the HTML up
Tex> to and including that statement as ascii. I'll check on this.

Since you are using apache, it is quite easy to get the extra headers
sent at the protocol level rather than having to use meta tags.

You can use a Header directive in an .htaccess file a la:

<Files foobar.html>
  Header set Content-Language en-US
  Header set Content-Type text/html; charset=UTF-8
</Files>

Or, you can use mod_cern_meta to put the extra headers in a
foo.html.meta file. (The actual filename suffix can be set in the
.htaccess file or the main server conf files.)

There are other ways as well. Apache will already (if you use the
default configs) add the Content-Language header if you use a filename
like foo.en.html. You could have it also add the charset via a
similar mechanism. Something like:

AddCharset UTF-8 utf8

will make foobar.en.utf-8.html send the headers:

Content-Language: en
Content-Type: text/html; charset=UTF-8

given the default configs for language and type extensions.

Hmmm. Looking at a recent install of SuSE, using their apache rpm,
.utf8 is already configured as an extension to set charset=UTF8, so
you could try just renaming the file to eg:

http://www.i18nguy.com/unicode-plane1.utf8.html

to set the charset. You'd have to add your own AddCharset directives
for UTF-16 and UTF-32.

-JimC



This archive was generated by hypermail 2.1.2 : Mon Apr 22 2002 - 08:06:28 EDT