From: Addison Phillips [wM] (firstname.lastname@example.org)
Date: Mon Mar 15 2004 - 10:35:44 EST
There are two things that you need to check here.
First, is your environment set up to display the non-ASCII characters?
Solaris offers an impressive array of UTF-8 locales which should allow you
to view Unicode data. You can switch to one of these by setting your LANG
environment variable (although this may not solve font problems and other
issues). Use the command 'locale -a' to list the available locales on your
machine and look for one that looks like (for example) 'en_US.UTF-8'. [You
may also be able to use a locale compatible with your data. An EUC-JP
locale, for example, will display Japanese characters on the console.]
Note that changing your locale on Unix isn't the whole solution. You may
have to install fonts appropriate for the language/data (otherwise you'll
see hollow boxes instead of question marks).
Be sure to set LANG before running your Java program. For example:
%LANG=en_US.utf8; java -cp ...
The second issue you may encounter: is your data actually making it into the
database? If your database is not configured to use a Unicode encoding (or
at least a multibyte encoding compatible with your data), then the question
marks are being created by the database when you store the data originally.
How database encodings are configured and how you retrieve that information
varies by database. I have a whitepaper on
http://www.inter-locale.com/IUC19.pdf (which is rather stale, but has some
useful information). You might check in your Java program to see if you are
getting question marks in your Strings. This would indicate a problem with
the database or (rarely) the JDBC driver configuration.
Finally, you should check your code out. If you are just writing a little
console app and your database is correctly configured, the problem may just
be the locale and setup of your Solaris box as noted above. If you are
having problems with text files, you should check out your use of
OutputStreamWriter to ensure that you control the encoding it uses (and
don't use the default system encoding, which is affected by your runtime
locale). Writing out files as UTF-8 (instead of System.out.println()) will
let you use the native2ascii utility or other programs to investigate the
actual codepoints you are retrieving.
Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
Internationalization is an architecture.
It is not a feature.
> -----Original Message-----
> From: email@example.com [mailto:firstname.lastname@example.org]On
> Behalf Of Manga
> Sent: lundi 15 mars 2004 07:08
> To: email@example.com
> Subject: multibyte char display
> I use UTF-8 encoding in java code to store multi byte characters in the
> db . When i retreive the multi byte characters from db , i see
> "?" inplace of the actual multi byte characters. I use solaris os.
> Is there any environment variable which i can set to see the actual
> characters on my terminal window.
This archive was generated by hypermail 2.1.5 : Mon Mar 15 2004 - 11:25:08 EST