How to create an all UTF-8 Web site using Java (JSP)

From: Paul Deuter (Paul.Deuter@plumtree.com)
Date: Tue Jul 17 2001 - 01:00:15 EDT


How do I create a pure UTF-8 web site? Specifically is there a way to
change the standard servlet class to use UTF-8 as the default char
encoding instead of ISO 8859-1?

I have looked at the source code for Jakarta Tomcat 3.2.2 and noticed
the statement:

public static final String DEFAULT_CHAR_ENCODING = "8859-1"; (in
constants.java)

The various classes such as HttpServletRequest and HttpServletResponse
use this constant when creating the default readers and writers and as a
consequence, the web site ends up being Latin-1.

I have experimented with adding the line:

request.setContentType("text/html; charset="UTF-8");

This change does correctly change the encoding of the request object to
UTF-8 and subsequent output gets sent to the browser in UTF-8. However
the response object incorrectly interprets response data because it is
decoding %XX octets as Latin-1 instead of UTF-8.

I know there is special code that I can write such as
String param = request.getParameter("parameter1");
byte[] rawVal = param.getBytes("UTF-8")
//create new string again.
param = new String(rawVal);

However I would prefer not to have to write special code to re-interpret
data after the fact. Also there are other standard classes which also
seem to assume iSO-8859-1 as the default character set (such as
URLDecoder an URLEncoder). Since internal data will always be Unicode,
I would prefer to set the default encoding to UTF-8 and be able to write
standard Java code.

Is there an easy way to override the default encoding at a low level so
that all the classes that use the default encoding will just work?

Thanks,
Paul Deuter
Plumtree Software
paul.deuter@plumtree.com

Paul Deuter
Internationalization Manager
Plumtree Software
paul.deuter@plumtree.com
 



This archive was generated by hypermail 2.1.2 : Tue Jul 17 2001 - 01:45:57 EDT