From: Doug Ewell (dewell@adelphia.net)
Date: Tue Nov 23 2004 - 23:49:13 CST
Harshal Trivedi <harshal dot trivedi at gmail dot com> wrote:
> How can i determine end of UCS-2/UCS-4 string while encoding it in C
> program?
> Normal C string ends with '\0' - ASCII NULL as terminating
> character.What symbol,pattern or character in UCS-2 or UCS-4
> substitutes that ASCII NULL as termination symbol.
You wouldn't normally use the ordinary C string type to encode a UTF-16
(not UCS-2, please) or UCS-4 string. They're not meant for that, for
exactly the reason your question implies: incidental zero-bytes will
cause premature termination of the string, because almost all C
implementations assume an 8-bit encoding.
The solution is either to use UTF-8, or use "wide character" strings
based on 16-bit (or, less likely, 32-bit) "character" units.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Tue Nov 23 2004 - 23:51:58 CST