Re: unicode on Linux

From: Edward H. Trager (ehtrager@umich.edu)
Date: Tue Oct 21 2003 - 07:35:43 CST

Next message: Jill Ramonsky: "RE: Line Separator and Paragraph Separator"
Previous message: Kent Karlsson: "RE: Backslash n [OT] was Line Separator and Paragraph Separator"
In reply to: Shao, Yiying: "RE: unicode on Linux"
Next in thread: Jungshik Shin: "Re: unicode on Linux"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Monday 2003.10.20 13:31:49 -0700, Shao, Yiying wrote:
> Thanks for your info.
>
> >>Just wondering if anybody knowss how unicode is on Linux?
> >>
> >Very good support. Default charset for recent versions of some popular
> distributions.
>
> What are those popular distributions and which version?
>
>
> >>On Red Hat Linux, if UTF-8 is not made as the default encoding for Chnese/Japanese/Korean, what it is using for those double byte languages?
>
> >The old multi-byte character sets.
>
> So, how should I implement my code? Do I have to say if this is Japanese (for example), convert the unicode (UTF-8) to multi-byte character? That seems very painful.
>
No. Forget about old multi-byte encodings. Just set your locale to a UTF-8 locale and use UTF-8
for all languages. In my experience (on SuSE 7.3, 8.1, 8.2, and the 9.0 betas) all of the "important"
applications handle CJK languages perfectly well under a UTF-8 locale. The "important" applications
for me are things like Open Office 1.1, Konsole, vim, MySQL, and Mozilla. For CJK input, use SCIM
(http://ns.turbolinux.com.cn/~suzhe/scim/index.html). For many other details about Unicode
on Linux, see my page at http://eyegene.ophthy.med.umich.edu/unicode/index.html.

> >>Does later Red Had Linux makes the UTF-8 the default encoding for them?
>
> AFAIK only if you manually set it to a UTF-8 locale, e.g.
> LANG=zh-CN.UTF-8. Notice, though, that some older software will not be
> aware of this change, so many characters will not be displayed properly.
>
> So, is this setting available from Red Hat 8.0 or later? Also, you mean some old version of Linux may not aware of this setting?
>
>
> Besides, do you happen to know ICU from IBM? Does it take care of the unicode problems with double byte language for Linux?

Most likely. But I think your life will be easier if you just use UTF-8 for all languages and forget about legacy
encodings. I'm sure ICU must have very robust UTF-8 support.

>
> Thanks,
> Yiying

Next message: Jill Ramonsky: "RE: Line Separator and Paragraph Separator"
Previous message: Kent Karlsson: "RE: Backslash n [OT] was Line Separator and Paragraph Separator"
In reply to: Shao, Yiying: "RE: unicode on Linux"
Next in thread: Jungshik Shin: "Re: unicode on Linux"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST