RE: FW: Subj: Converting from UCS-2 to UTF-8

From: Rick Cameron (Rick.Cameron@businessobjects.com)
Date: Thu Aug 18 2005 - 15:44:23 CDT

  • Next message: Samuel Thibault: "Re: FW: Subj: Converting from UCS-2 to UTF-8"

    By its nature, UCS-2 text will not contain any characters with scalar value greater than U+FFFF. UCS-2 is a strict subset of UTF-16, so using the converter for UTF-16 to UTF-8 will work.

    -----Original Message-----
    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of Samuel Thibault
    Sent: Thursday, 18 August 2005 13:11
    To: Magda Danish (Unicode)
    Cc: unicode@unicode.org
    Subject: Re: FW: Subj: Converting from UCS-2 to UTF-8

    Magda Danish (Unicode), le Thu 18 Aug 2005 12:42:21 -0700, a écrit :
    > But it's not clear to me that I can use any of these programs for UCS-2. I am aware that UTF-16 and UCS-2 are almost identical, but it's the "almost" that worries me. Can you confirm that the converter from UTF-16 to UTF-8 will work for converting from UCS-2 to UTF-8 without any loss or corruption of data?

    It will only work if the text doesn't contain unicode characters
    starting from U+10000.

    But converting from UCS-2 to UCS-4 is really easy: just append two
    \0 bytes after each 2-bytes character on a little endian machine, or
    before each 2-bytes character on a big endian machine. Then, since
    UCS-4==UTF-32, you can use the UTF32-UTF8 converter.

    Regards,
    Samuel



    This archive was generated by hypermail 2.1.5 : Thu Aug 18 2005 - 15:45:58 CDT