(no subject)

From: Siddhant Kaul (kaulsn@acs.wooster.edu)
Date: Thu Jul 03 1997 - 08:57:11 EDT


> Pierre Lewis <lew@nortel.ca> wrote:
> >
> > In message "Re: MES as an ISO standard?",
> > 'Dan.Oscarsson@trab.se' writes:
> >
> > > Maybe you don't care, but I do. Java is a wonderful language just
> because
> > > it is the only one of the great internationally used programming
> languages
> > > that allow me to use names on variables and routines in my own
> language!
> >
> > Java is a wonderful language for me too, not because it supports
> > Unicode in variable names, but because it supports Unicode at run
> > time. The former is much lower on my list of priorities.
> >
>
> It's not a low priority for Japanese programmers. They usually have to
>
> try to take the concept represented by their class or variable, and
> try
> to think of the English word for it or try to remember the rules of
> romanization and use romanized Japanese. Both are annoying. They're
> also
> quite error prone because English spelling is even more of a challenge
>
> for the Japanese than it is for native speakers, and because of the
> wide
> variation in valid romanizations of the same word or expression. No
> two
> members of a Japanese programming team are likely to automatically
> romanize a Japanese word the same way.
>
> With the new Japanese JDK, Japanese programmers can write their source
>
> code in Shift-JIS, using the real Japanese names for things, and the
> compiler will convert the identifiers into UTF-8 in the class files.
> Very convenient.
>
> > I'd much rather, for example, see Java provide us with a UTF-8
> > input/output mechanism (more general than the one I've found so far)
>
> Oh, yeah. Input. Okay, here it is in both directions:
>
> ==============================================
> TO OUTPUT IN UTF-8 WITHOUT A BYTE COUNT:
>
> try
> {
> PrintWriter out = new PrintWriter(
> new BufferedWriter(
> new OutputStreamWriter(
> new FileOutputStream("out.unicode"), "UTF8")));
>
> out.println("Hello, \u65e5\u672c");
> out.close();
> }
> catch (Exception e) {System.out.println(e);}
>
> Output:
> 48 65 6C 6C 6F 2C 20 E697A5 E69CAC 0D 0A
> H e l l o , ni hon CR LF
>
> TO OUTPUT IN UTF-8 WITH THE BYTE COUNT (USEFUL FOR FILES CONTAINING
> MIXED DATA TYPES WHERE YOU NEED TO KNOW WHERE YOUR STRINGS END AND
> YOUR
> intS BEGIN):
>
> try
> {
> DataOutputStream dos = new DataOutputStream(
> new FileOutputStream("out.uni"));
>
> dos.writeUTF("Hello");
> dos.close();
> }
> catch (Exception e) {System.out.println(e);}
>
> Outputs a short containing the number of bytes (not chars) to follow,
> followed by "Hello" in UTF-8:
> 0005 48 65 6C 6C 6F
> 5B H e l l o
>
> TO INPUT UTF-8 TEXT:
>
> try
> {
> BufferedReader in = new BufferedReader(
> new InputStreamReader(
> new FileInputStream("Sample.sjis"), "UTF8"));
>
> inputStr = in.readLine();
> }
> catch (Exception e) {System.out.println(e);}
>
> =======================================
>
> There. I think that's a pretty nice way to do it.
>
> __Glen__
> glen.perkins@nativeguide.com

Is there any way to construct a string using utf8 charactrs? I tried it
using the following code:

public static byte [] convert(byte [] inBytes, String inEnc, String
outEnc)
    throws UnsupportedEncodingException
{
    return new String(inBytes, inEnc).getBytes(outEnc);
}
and then cosntructing the string using the String(byte b[])
constructor. It didn't seem to work, returning a null string
Siddhant Kaul
Student researcher,
The college of Wooster



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT