Re: FW: Subj: Converting from UCS-2 to UTF-8

From: Samuel Thibault (samuel.thibault@ens-lyon.org)
Date: Thu Aug 18 2005 - 15:11:13 CDT

  • Next message: Rick Cameron: "RE: FW: Subj: Converting from UCS-2 to UTF-8"

    Magda Danish (Unicode), le Thu 18 Aug 2005 12:42:21 -0700, a écrit :
    > But it's not clear to me that I can use any of these programs for UCS-2. I am aware that UTF-16 and UCS-2 are almost identical, but it's the "almost" that worries me. Can you confirm that the converter from UTF-16 to UTF-8 will work for converting from UCS-2 to UTF-8 without any loss or corruption of data?

    It will only work if the text doesn't contain unicode characters
    starting from U+10000.

    But converting from UCS-2 to UCS-4 is really easy: just append two
    \0 bytes after each 2-bytes character on a little endian machine, or
    before each 2-bytes character on a big endian machine. Then, since
    UCS-4==UTF-32, you can use the UTF32-UTF8 converter.

    Regards,
    Samuel



    This archive was generated by hypermail 2.1.5 : Thu Aug 18 2005 - 15:12:31 CDT