RE: UTF-16 inside UTF-8

From: jarkko.hietaniemi@nokia.com
Date: Wed Dec 03 2003 - 04:36:32 EST

  • Next message: Arcane Jill: "RE: MS Windows and Unicode 4.0 ?"

    > We're not speaking about the same thing: I was not discussing the
    > representation of individual characters (yes it's simple to make
    > wchar_t 32-bit with UCS4), but the encoding of large amounts of
    > strings for general text processing. That's where UTF-16 is better.

            For some values of "better", and for some values of "text processing".
            Because UTF-16 is variable width, it can be slow for certain string operations:
            basically anything that requires "random access" to the string, like "give me the substring
            from (code point) the position 1000 to the position 1999". Unless you have some sort of
            caching, or something else clever, you'll be O(position) instead of O(1).



    This archive was generated by hypermail 2.1.5 : Wed Dec 03 2003 - 05:26:05 EST