Re: unicode on Linux

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Oct 28 2003 - 16:35:30 CST


You should use Unicode internally - UTF-16 when you use ICU or most other libraries and software.

Externally, that is for protocols and files and other data exchange, you need to identify (input:
determine; output: label) the encoding of the data and convert between it and Unicode. If you can
choose the output encoding, then stay with one of the Unicode charsets (UTF-8 or SCSU etc.), or else
- if you are absolutely certain that they suffice - use US-ASCII or ISO 8859-1.

The system default encoding or the current process codepage may or may not be a good guess for the
encoding in your input/output. Include a user override of the charset in your design.

markus

Shao, Yiying wrote:
> *Using ICU, which uses UTF-16, to handle all strings for cross platform localization.
>
> *since UTF-8 is the default locale for Red Hat Linux, so I need to convert the strings from UTF-16 to UTF-8. But UTF-8 is not the default locale for CJK. So, on CJK, I need to set UTF-8 as the default locale, the converted UFT-8 can still work with CJK.
>
> *Or there may be other better ways to do this? If it is possible to find out the current default locale encoding (such as UTF-16, UTF-8, multi-byte and etc.) at run time for an App, then according the current locale, do the correct conversions? ICU provides rich conversion utilities. This way, I can guaranty that my App will work properly and will not screw up other Apps on the same system.



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST