Re: UTF-8

From: Stephen Toner (
Date: Wed Sep 20 2000 - 07:16:57 EDT

What exactly happens when I use the <%@ page contentType="text/html; utf-8"
%> directive. When I include this for example letters in the database which
were stored correctly are rendered incorrectly. They come in exctly the
same form the form- but when I try to output them to a page with this
directive it doesn't combine the UTF-8 bytes to form the character, and
instead treats the bytes as seperate characters. Without it I just use the
meta tag to interprete the bytes as UTF-8.
----- Original Message -----
From: <>
To: "Stephen Toner" <>
Cc: "Unicode List" <>
Sent: Tuesday, September 19, 2000 4:36 PM
Subject: Re: UTF-8

Hi Stephen,

Java's internal encoding is UTF-16. Every String is encoded as
UTF-16. Since no web pages are generated in that encoding, JSP provides a
basic mechanism for setting up a character set converter (essentially an
InputStreamReader and an OutputStreamReader).

The default page encoding for JSP is ISO-8859-1. The processing page will
hand you UTF-8 instead of 8859-1 if you use the <%@ page
contentType="text/html; utf-8" %> directive in your page.

If you wish to receive a UTF-8 "POST" or "GET" in an 8859-1 page, you will
need to setup the InputStreamReader to convert the characters yourself. I
know I'm being sketchy here, but I'm running late this morning. Let me
know if the contentType directive doesn't fix your problem.



On Tue, 19 Sep 2000, Stephen Toner wrote:

> Hi,
> I am still having trouble with inputted UTF-8 from a browser. The problem
is that my database can't store UTF-8 but only UTF-16. I have tried to
convert between the two with little success. The trouble is that the
inputted string is obtained from the request object using String
> This leaves me with a string which I think(Please correct me if I'm wrong)
is correctly encoded in UTF-8 (For example a japanese character was
converted to a 3-byte sequence.- ,) However the String API only allows me
to convert a byte array containing non-Unicode text to Unicode or you can
convert a String object into a byte array of non-Unicode characters. But
what I have is a string of non-Unicode characters which I must convert to
Unicode characters. I tried converting it to bytes, which without
specifying the encoding left 2 question marks in, and with specifying the
encoding as UTF-8 just converted each character to UTF-16 giving 6 bytes
instead of the 2 bytes that I wanted. If I was able to somehow get the byte
values for each character I would be flying, but unfortunately a load of
different characters get converted to 3F- the code for a question mark.
> Does anyone know of any way of converting directly in Java?
> Also when I submit a form page with the encoding specified as UTF-8 what
actually does the converting from what is in the form to UTF-8?
> Thanks for any help,
> Stephen

Addison P. Phillips Principal Consultant
Inter-Locale LLC
Los Gatos, CA, USA

+1 408.210.3569 (mobile) +1 408.904.4762 (fax)
Globalization Engineering & Consulting Services

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT