Re: Wide Characters in Windows and UTF16

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Aug 12 2004 - 11:19:12 CDT

  • Next message: John Cowan: "Re: Combining across markup?"

    Rick Cameron wrote:
    > Microsoft Windows uses little-endian byte order on all platforms. Thus, on
    > Windows UTF-16 code units are stored in little-endian byte order in memory.
    >
    > I believe that some linux systems are big-endian and some little-endian. I
    > think linux follows the standard byte order of the CPU. Presumably UTF-16
    > would be big-endian or little-endian accordingly.

    This is somewhat misleading. For internal processing, where we are talking about the UTF-16 encoding
    form (quite different from the external encoding _scheme_ of the same name), we don't have strings
    of bytes but strings of 16-bit units (WCHAR in Windows). Program code operating on such strings
    could not care less what endianness the CPU uses. Endianness is only an issue when the text gets
    byte-serialized, as is done for the external encoding schemes (and usually by a conversion service).

    markus



    This archive was generated by hypermail 2.1.5 : Thu Aug 12 2004 - 11:26:17 CDT