RE: Conversion of DBCS / MBCS characters to UTF8

From: Paul Dempsey (paulde@Exchange.Microsoft.com)
Date: Fri Jan 14 2000 - 13:31:33 EST


> -----Original Message-----
> From: Kedar Moghe [mailto:kmoghe@quark.com.sg]
> Sent: Thursday, January 13, 2000 7:13 PM
> To: 'info@unicode.org'
> Subject: I need some support.

>
...
> requires conversion
> of DBCS / MBCS characters to UTF8 and vice versa. Basically, it is
> Windows/NT environment. ... Win32 based code specifically. ...

>If you can suggest a platform
> independent solution
> then it will be better.

Triangulate through Unicode (these are Win32 methods):

Convert DBCS/MBCS to Unicode with MultiByteToWideChar.
Convert Unicode to UTF-8 with WideCharToMultiByte. The codepage number for
UTF-8 is 65001.

Convert UTF-8 to Unicode with MultiByteToWideChar, then convert Unicode to
MBCS/DBCS with WideCharToMultiByte.

The appropriate NLS file for the MBCS/DBCS codepage must be installed on the
machine.

Win95 (not sure about Win98) does not support codepage 65001. To convert to
UTF-8 on Win9x, you can use IMultiLanguage or IMultiLanguage2 from MLANG.
Information on IMultiLanguage can be accessed through
http://search.microsoft.com/us/dev/default.asp under the workshop category.
MLANG lets you set up conversions that do the triangulation in one step.

Also, Unicode <-> UTF-8 conversion source code is available on the Unicode
web site.
 
There are a number of platform-independent solutions. The latest Kermit lets
you do it. A search of the Unicode list archive will show a number of
others.

--- Paul



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT