Re: unicode on Linux

From: Edward H. Trager (
Date: Tue Oct 21 2003 - 07:35:43 CST

On Monday 2003.10.20 13:31:49 -0700, Shao, Yiying wrote:
> Thanks for your info.
> >>Just wondering if anybody knowss how unicode is on Linux?
> >>
> >Very good support. Default charset for recent versions of some popular
> distributions.
> What are those popular distributions and which version?
> >>On Red Hat Linux, if UTF-8 is not made as the default encoding for Chnese/Japanese/Korean, what it is using for those double byte languages?
> >The old multi-byte character sets.
> So, how should I implement my code? Do I have to say if this is Japanese (for example), convert the unicode (UTF-8) to multi-byte character? That seems very painful.
No. Forget about old multi-byte encodings. Just set your locale to a UTF-8 locale and use UTF-8
for all languages. In my experience (on SuSE 7.3, 8.1, 8.2, and the 9.0 betas) all of the "important"
applications handle CJK languages perfectly well under a UTF-8 locale. The "important" applications
for me are things like Open Office 1.1, Konsole, vim, MySQL, and Mozilla. For CJK input, use SCIM
( For many other details about Unicode
on Linux, see my page at

> >>Does later Red Had Linux makes the UTF-8 the default encoding for them?
> AFAIK only if you manually set it to a UTF-8 locale, e.g.
> LANG=zh-CN.UTF-8. Notice, though, that some older software will not be
> aware of this change, so many characters will not be displayed properly.
> So, is this setting available from Red Hat 8.0 or later? Also, you mean some old version of Linux may not aware of this setting?
> Besides, do you happen to know ICU from IBM? Does it take care of the unicode problems with double byte language for Linux?

Most likely. But I think your life will be easier if you just use UTF-8 for all languages and forget about legacy
encodings. I'm sure ICU must have very robust UTF-8 support.

> Thanks,
> Yiying

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST