Re: MES as an ISO standard?

From: Glen Perkins (gperkins@netcom.com)
Date: Wed Jul 02 1997 - 14:22:13 EDT


Pierre Lewis <lew@nortel.ca> wrote:
>
> In message "Re: MES as an ISO standard?",
> 'Dan.Oscarsson@trab.se' writes:
>
> > Maybe you don't care, but I do. Java is a wonderful language just because
> > it is the only one of the great internationally used programming languages
> > that allow me to use names on variables and routines in my own language!
>
> Java is a wonderful language for me too, not because it supports
> Unicode in variable names, but because it supports Unicode at run
> time. The former is much lower on my list of priorities.
>

It's not a low priority for Japanese programmers. They usually have to
try to take the concept represented by their class or variable, and try
to think of the English word for it or try to remember the rules of
romanization and use romanized Japanese. Both are annoying. They're also
quite error prone because English spelling is even more of a challenge
for the Japanese than it is for native speakers, and because of the wide
variation in valid romanizations of the same word or expression. No two
members of a Japanese programming team are likely to automatically
romanize a Japanese word the same way.

With the new Japanese JDK, Japanese programmers can write their source
code in Shift-JIS, using the real Japanese names for things, and the
compiler will convert the identifiers into UTF-8 in the class files.
Very convenient.

> I'd much rather, for example, see Java provide us with a UTF-8
> input/output mechanism (more general than the one I've found so far)

Oh, yeah. Input. Okay, here it is in both directions:

==============================================
TO OUTPUT IN UTF-8 WITHOUT A BYTE COUNT:

try
{
  PrintWriter out = new PrintWriter(
           new BufferedWriter(
           new OutputStreamWriter(
           new FileOutputStream("out.unicode"), "UTF8")));

  out.println("Hello, \u65e5\u672c");
  out.close();
}
catch (Exception e) {System.out.println(e);}

Output:
48 65 6C 6C 6F 2C 20 E697A5 E69CAC 0D 0A
H e l l o , ni hon CR LF

TO OUTPUT IN UTF-8 WITH THE BYTE COUNT (USEFUL FOR FILES CONTAINING
MIXED DATA TYPES WHERE YOU NEED TO KNOW WHERE YOUR STRINGS END AND YOUR
intS BEGIN):

try
{
  DataOutputStream dos = new DataOutputStream(
                         new FileOutputStream("out.uni"));

  dos.writeUTF("Hello");
  dos.close();
}
catch (Exception e) {System.out.println(e);}

Outputs a short containing the number of bytes (not chars) to follow,
followed by "Hello" in UTF-8:
0005 48 65 6C 6C 6F
5B H e l l o

TO INPUT UTF-8 TEXT:

try
{
   BufferedReader in = new BufferedReader(
             new InputStreamReader(
             new FileInputStream("Sample.sjis"), "UTF8"));

   inputStr = in.readLine();
}
catch (Exception e) {System.out.println(e);}

=======================================

There. I think that's a pretty nice way to do it.

__Glen__
glen.perkins@nativeguide.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT