Re: posting of unicode data to servlet

From: Bhalchandra Patil (bpatil@mahindrabt.com)
Date: Wed Nov 29 2000 - 13:05:26 EST


thanks addison,

but i am still not clear.
I have rectified the mistake in the META tag.

I could not find any equivalent command for the > <%@ page
contentType="text/html; charset=UTF-8" %> in servlet apis.

but, i have changed the default character set of the webserver to utf-8.

Now i am entering only one chinese character in the Textfield (with name say
"uniname") in html page, whose unicode value is 20840 decimal or 0x5158 hex.
When i submit the page to the server, the request goes to servlet and when i
say String str = request.getParameter("uniname"), it should give me a string
with length 1 ( and (long)str.charAt(0) should give me 20840 ).
[ Rather i want such string ]

Is it the right format of the unicode string what i am expecting?

Instead, it gives me a string with two characters with ascii values 145 and
83.

Is there any fundamental mistake i am doing or its something to do with
webserver's handling of posted unicode data?

regards,
bhala

----- Original Message -----
From: <addison@inter-locale.com>
To: Bhalchandra Patil <bpatil@mahindrabt.com>
Cc: Unicode List <unicode@unicode.org>
Sent: Wednesday, November 29, 2000 10:42 PM
Subject: Re: posting of unicode data to servlet

> Hi Bhala,
>
> When you use request.getParameter( ) the request class converts the data
> POSTed to a Java String object. This includes converting the data from
> whatever the servlet *thinks* the page is encoded as to Java's internal
> representation, which is UCS-2 (i.e. Unicode).
>
> It is important to tell the servlet what the encoding of the page is,
> therefore. Just putting a META tag into the page won't do it. In a JSP
> page, for example, you can declare:
>
> <%@ page contentType="text/html; charset=UTF-8" %>
>
> Note that your META tag has a typo in it. There should not be a
> double-quote after the charset=.
>
> You should be aware that you can generate the page in any valid character
> set and weblogic's servlet engine will convert the results to Unicode for
> you. For example, you might choose to use the Big5 character encoding for
> a Traditional Chinese page. The page directive will result in data POSTed
> to you being converted to a Java String (and thus Unicode).
>
> If you want to get access to the specific *characters* in the String you
> can use the various methods for accessing chars and char arrays in the
> String class in conjunction with the Character class to access all kinds
> of useful information about specific characters. Using getBytes() the way
> you've described will result in converting the characters to a byte
> oriented encoding, such as UTF-8, which is not really what you want to do
> in this case.
>
> Best Regards,
>
> Addison
>
> ===========================================================
> Addison P. Phillips Principal Consultant
> Inter-Locale LLC http://www.inter-locale.com
> Los Gatos, CA, USA mailto:addison@inter-locale.com
>
> +1 408.210.3569 (mobile) +1 408.904.4762 (fax)
> ===========================================================
> Globalization Engineering & Consulting Services
>
> On Wed, 29 Nov 2000, Bhalchandra Patil wrote:
>
> > Hi,
> >
> > i am running an servlet on weblogic ( jre 1.2). The html page should
accept
> > input in any character set say chinese. That value is posted to the
servlet.
> > I want to retrieve the unicode value of the character in the servlet.
> >
> > In the html page, i have specified meta tag
> > <META HTTP_EQUIV="Content-Type" content="text-html; charset="UTF-8">
> >
> > in servlet, i am using String str = request.getParameter("name")
> > str.getBytes("UTF8") does not work.
> >
> > What should i do to get the unicode values of the characters entered.
> >
> > Please help!!!!!
> >
> > regards,
> > bhala
> >
> >
> >
> >
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT