Re: Translating chinese into unicode

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Jan 16 2006 - 14:16:35 CST

  • Next message: Dr.James Austin: "unsubscribe me"

    Translating chinese into unicodeThis was not the question, and full of errors.

    Your reply is just about how to get a variable from the form data submitted, and it only works if the data was sent encoded with ISO-8859-1 (so it can't process Chinese user input!).

    And your code isunnecessarily bogous: it alreaty reads a correctly decoded string with getParameter: no need to reencode it (it is already decoded according to the form submission metadata that specifies the encoding of the form sent by the remote agent) to ISO-8859-1 into array of bytes, before redecoding it with the String() constructor (bogous too, because the one-parameter String constructor uses the *platform default* encoding which will often be different from ISO-8859-1, and that won't be able to produce anystring with Han ideographs!).

    The only case where you'll use such code is when the submitted form data does not explicitly states the encoding. In that case, the getParameter() method returns a String decoded with the assumed local encoding (which may havealready failed if the platform default encoding has invalid bytes).

    The correct way to write it (only when the submitted form data does not specifies the charset, something rare today with most browsers!) will then be:

    new String(request.getParameter("simple_name").getBytes("ISO8859-1"), "UTF-8")

    where the "UTF-8" parameter specifies explicitly the effective form data encoding (if it comes from a HTML form encoded with UTF-8), and that you MUST replace with "ISO-8859-1" if the HTML form was encoded with ISO-8859-1.

    NEVER use the default platform encoding (the one-parameter constructor is deprecated and exists only for compatibility with very old versions of Java).

    EXTREMELY BAD REPLY!

    ----- Original Message -----
    From: YAO Jiankang
    To: Muchchandi, Ratnaprabha (GE Consumer & Industrial)
    Cc: unicode@unicode.org
    Sent: Monday, January 16, 2006 9:27 AM
    Subject: Re: Translating chinese into unicode

    usually, in practise, we use it in the following format in jsp :

     String(request.getParameter("simple_name").getBytes("ISO8859-1"));



    This archive was generated by hypermail 2.1.5 : Mon Jan 16 2006 - 14:30:26 CST