Re: posting of unicode data to servlet

From: addison@inter-locale.com
Date: Wed Nov 29 2000 - 12:12:59 EST


Hi Bhala,

When you use request.getParameter( ) the request class converts the data
POSTed to a Java String object. This includes converting the data from
whatever the servlet *thinks* the page is encoded as to Java's internal
representation, which is UCS-2 (i.e. Unicode).

It is important to tell the servlet what the encoding of the page is,
therefore. Just putting a META tag into the page won't do it. In a JSP
page, for example, you can declare:

 <%@ page contentType="text/html; charset=UTF-8" %>

Note that your META tag has a typo in it. There should not be a
double-quote after the charset=.

You should be aware that you can generate the page in any valid character
set and weblogic's servlet engine will convert the results to Unicode for
you. For example, you might choose to use the Big5 character encoding for
a Traditional Chinese page. The page directive will result in data POSTed
to you being converted to a Java String (and thus Unicode).

If you want to get access to the specific *characters* in the String you
can use the various methods for accessing chars and char arrays in the
String class in conjunction with the Character class to access all kinds
of useful information about specific characters. Using getBytes() the way
you've described will result in converting the characters to a byte
oriented encoding, such as UTF-8, which is not really what you want to do
in this case.

Best Regards,

Addison

===========================================================
Addison P. Phillips Principal Consultant
Inter-Locale LLC http://www.inter-locale.com
Los Gatos, CA, USA mailto:addison@inter-locale.com

+1 408.210.3569 (mobile) +1 408.904.4762 (fax)
===========================================================
Globalization Engineering & Consulting Services

On Wed, 29 Nov 2000, Bhalchandra Patil wrote:

> Hi,
>
> i am running an servlet on weblogic ( jre 1.2). The html page should accept
> input in any character set say chinese. That value is posted to the servlet.
> I want to retrieve the unicode value of the character in the servlet.
>
> In the html page, i have specified meta tag
> <META HTTP_EQUIV="Content-Type" content="text-html; charset="UTF-8">
>
> in servlet, i am using String str = request.getParameter("name")
> str.getBytes("UTF8") does not work.
>
> What should i do to get the unicode values of the characters entered.
>
> Please help!!!!!
>
> regards,
> bhala
>
>
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT