Re: UTF-8

Date: Tue Sep 19 2000 - 11:36:49 EDT

Hi Stephen,

Java's internal encoding is UTF-16. Every String is encoded as
UTF-16. Since no web pages are generated in that encoding, JSP provides a
basic mechanism for setting up a character set converter (essentially an
InputStreamReader and an OutputStreamReader).

The default page encoding for JSP is ISO-8859-1. The processing page will
hand you UTF-8 instead of 8859-1 if you use the <%@ page
contentType="text/html; utf-8" %> directive in your page.

If you wish to receive a UTF-8 "POST" or "GET" in an 8859-1 page, you will
need to setup the InputStreamReader to convert the characters yourself. I
know I'm being sketchy here, but I'm running late this morning. Let me
know if the contentType directive doesn't fix your problem.



On Tue, 19 Sep 2000, Stephen Toner wrote:

> Hi,
> I am still having trouble with inputted UTF-8 from a browser. The problem is that my database can't store UTF-8 but only UTF-16. I have tried to convert between the two with little success. The trouble is that the inputted string is obtained from the request object using String temp=request.getParameter("TheText");
> This leaves me with a string which I think(Please correct me if I'm wrong) is correctly encoded in UTF-8 (For example a japanese character was converted to a 3-byte sequence.- ,) However the String API only allows me to convert a byte array containing non-Unicode text to Unicode or you can convert a String object into a byte array of non-Unicode characters. But what I have is a string of non-Unicode characters which I must convert to Unicode characters. I tried converting it to bytes, which without specifying the encoding left 2 question marks in, and with specifying the encoding as UTF-8 just converted each character to UTF-16 giving 6 bytes instead of the 2 bytes that I wanted. If I was able to somehow get the byte values for each character I would be flying, but unfortunately a load of different characters get converted to 3F- the code for a question mark.
> Does anyone know of any way of converting directly in Java?
> Also when I submit a form page with the encoding specified as UTF-8 what actually does the converting from what is in the form to UTF-8?
> Thanks for any help,
> Stephen

Addison P. Phillips Principal Consultant
Inter-Locale LLC
Los Gatos, CA, USA

+1 408.210.3569 (mobile) +1 408.904.4762 (fax)
Globalization Engineering & Consulting Services

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT