Re: FW: Subj: Converting from UCS-2 to UTF-8

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Aug 19 2005 - 12:13:33 CDT

  • Next message: Chris Harvey: "Chukchee CYRILLIC EL WITH HOOK?"

    From: "Dean Harding" <dean.harding@dload.com.au>
    To: <unicode@arabink.com>; "'Magda Danish (Unicode)'"
    <v-magdad@microsoft.com>
    Cc: <unicode@unicode.org>
    Sent: Friday, August 19, 2005 1:25 AM
    Subject: RE: FW: Subj: Converting from UCS-2 to UTF-8

    > Gregg Reynolds wrote:
    >> On windows, the easiest thing to do is install cygwin, which comes with
    >> a command line iconv implementation. http://www.cygwin.com/
    >
    > Wouldn't it be easier to use WideCharToMultiByte and pass in CP_UTF8 as
    > the
    > code page identifier? No need to download 3rd party libraries then.

    For those users that still run Windows 95/98/ME, this won't work, as these
    systems can only do the following:

    - WideCharToMultiByte(): can only convert from UTF-16 to the local ANSI or
    OEM 8-bit charset. No support to convert even to UTF-8! When converting to
    ANSI or OEM, unsupported characters are silently replaced by '?'.

    - MultiByteToWideChar(): can only convert from the local ANSI or OEM 8-bit
    charset or from UTF-8, to UTF-16. This allows for example Notepad to load
    and display an UTF-8 file, and even working on it, but it CANNOT save it
    correctly (saving will silently replace all characters to the ANSI charset,
    and replace missing characters by '?', so the saved file will not be UTF-8
    encoded...)

    For other NT/2000/XP/2003 systems, the conversions offered by the two
    routines require that various charsets or codepages be installed in
    Windows\System32 (these are the cp*.nls files). The list of supported
    codepages seems hardcoded within the system and not extensible, and they can
    only be installed using the Regional Settings control panel (it's not enough
    to just copy the *.nls files).

    I don't know if it's even possible to add more codepages than those
    supported on each version of Windows (and I didn't find any place in the
    registry where those codepages are effectively registered, as the existing
    entries just seem to be there to allow compatiblity with other versions of
    Windows by mapping the effective filenames used for the codepage mappings).

    The restrictions above seem to exist for security reason (maps should not be
    replacable, as it would affect the compatibility between Unicode and ANSI
    Win32 APIs), and Microsoft does not provide any info about how to develop
    and install new codepages...

    So there are still applications needing converters based on other routines
    and mappings.



    This archive was generated by hypermail 2.1.5 : Fri Aug 19 2005 - 12:15:10 CDT