Re: Java and UTF

From: Glen Perkins (gperkins@netcom.com)
Date: Wed Jul 02 1997 - 17:05:29 EDT


Pierre Lewis <lew@nortel.ca> wrote:
>

> > TO INPUT UTF-8 TEXT:
> >
> > try
> > {
> > BufferedReader in = new BufferedReader(
> > new InputStreamReader(
> > new FileInputStream("Sample.utf"), "UTF8"));
> >
> > inputStr = in.readLine();
> > }
>
> Strange. The book I was referring to (Java in a nutshell, 2nd)
> doesn't show this last constructor. What JDK is that in? Is the
> book already out-of-date (I bought it just 2 days ago).
>

I got it from the source code. Those are the only docs I really
trust.;-) Notice that the encoding parameter is actually a part of the
InputStreamReader's signature, not the FileInputStream's. I changed the
indentation style in the code below to make that more obvious. I think
you'll find it compiles (and even works ;-) ) in any version of Java
1.1, but I'm using JDK 1.1.2.

Just to prove it, I've written a complete program that takes a String,
outputs it to a file in UTF8, reads it back in as UTF8, and prints it to
the console. You can see for yourself that it compiles. The console on
most machines will just print "??" where the kanji for "Nihon" ought to
be in the string, proving that at the very least it know that those 6
bytes equal two chars, but you can redirect it somewhere if you want
more proof. You can also verify that the file contains the hex you
expect.

Here's the complete source:
========================================
import java.io.*;

public class IOTest
{
    public static void main(String[] args)
    {
        try
        {
          PrintWriter out = new PrintWriter
                            (
                                new BufferedWriter
                                (
                                    new OutputStreamWriter
                                    (
                                        new
FileOutputStream("test.utf"),
                                        "UTF8"
                                    )
                                )
                            );

          out.println("Hello, \u65e5\u672c");
          out.close();
        }
        catch (Exception e) {System.out.println(e);}
        
        try
        {
            BufferedReader in = new BufferedReader
                                (
                                    new InputStreamReader
                                    (
                                        new FileInputStream("test.utf"),
                                        "UTF8"
                                    )
                                );

            String inputStr = in.readLine();
            System.out.println(inputStr);
        }
        catch(Exception e){}
    }
 }

=========================================

You can use this little program to experiment with a lot of different
encodings, too, by simply replacing the encoding names.

Viva Java!
__Glen__
glen.perkins@nativeguide.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT