Re: Unicode Support in Java

From: Tim Greenwood (greenwood@OpenMarket.com)
Date: Tue Mar 26 1996 - 04:16:06 EST


Martin and David are correct in their stements about the use of the
'Document Encoding' option on Netscape Navigator and other browsers.
Use of this is actually a violation - though a popular one - of the
HTTP standard. If no charset header is provided then the standard
(HTTP 1.0 Internet draft 0.5) defines the content as Latin-1.

This is the relevant section extracted from
http://www.w3.org/pub/WWW/Protocols/HTTP/1.0/spec.html

The "charset" parameter is used with some media types to define the
character set (Section 3.4) of the data. When no explicit charset
parameter is provided by the sender, media subtypes of the "text" type
are defined to have a default charset value of "ISO-8859-1" when
received via HTTP. Data in character sets other than "ISO-8859-1" or
its subsets must be labelled with an appropriate charset value in
order to be consistently interpreted by the recipient.

     Note: Many current HTTP servers provide data using charsets other
     than "ISO-8859-1" without proper labelling. This situation
     reduces interoperability and is not recommended. To compensate
     for this, some HTTP user agents provide a configuration option to
     allow the user to change the default interpretation of the media
     type character set when no charset parameter is given.

> If it's this option that you had in mind originally, then I am definitely right that
> this setting is only used if the HTTP header does not supply a MIME "charset"
> parameter. Both this MIME "charset" parameter and the "Document Encoding"
> option serve the *same* purpose, namely, to determine the conversion
> from the incomming encoding to whatever is used internally. In a perfect
> world, the documents would identify themselves in the HTTP header; the
> "Document Encoding" option is only a fix for documents that don't include
> the necessary header field and that the user thinks won't make sense when
> interpreted as ISO-8859-1 (Latin-1), which is the HTTP/HTML default.
> If you have any doubts about this, please read the relevant internet
> drafts (among else draft-ietf-html-i18n-03.txt, of which I am a
> coauthor).
>
> You also mention a "Language Option" above. Apart from what should
> be better called "Document Encoding", but appears in some versions
> of Netscape as "Language Encoding", there is also an option on
> the Mac that is rightfully called "Language". It appears under
> "General Preferences...". It serves to choose your language preferences,
> i.e. whether you want a document served in English, Japanese, German,
> American English, British English, and so on (provided it exists in that
> language on the server side). This is something that will still be
> of interest even if we (hopefully) soon don't have to select "Document
> Encoding" anymore (because the documents are correctly marked
> in the HTTP header, or because they all come in Unicode or UTF-8).
>
> Hope this help. Regards, Martin.
>
>
-------------------------------------
Tim Greenwood Open Market Inc
617 679 0320 greenwd@openmarket.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:30 EDT