From: Phillips, Addison (firstname.lastname@example.org)
Date: Sat Jun 27 2009 - 11:30:11 CDT
Let me see if I understand what you’re asking.
The Unicode character set defines characters. One of these characters, at code point 0, is the NULL character. See 
UTF-16 is a character encoding of the Unicode character set. In UTF-16, each Unicode code point (“character”) is represented by one or (occasionally) two 16-bit “code units” [by comparison, a byte is an 8-bit “code unit”]. The NULL character, in this encoding, is represented by a 16-bit code unit in which all of the bits are set to 0. A UTF-16 string consists of a sequence of 16-bit code units and it is a convention of many programming languages that the character NULL marks the end of a string buffer. In these programming languages, the appearance of a 16-bit NULL will cause the string to terminate.
If, by “character” or “value zero”, you mean the (8-bit) byte value zero, then, yes, there will be a lot of “zero” bytes in a UTF-16 encoded buffer: these do not represent the character NULL on their own. This doesn’t cause buffer termination, because one does not use an 8-bit byte to access a UTF-16 string. If you have a “uint_16t” for your UTF-16 string, your pointer will increment 16-bits, rather than 8-bits, at a time through the buffer. The value of a single “encoding unit” in this string is always 16-bits long. Only a 16-bit “null” represents the character NULL.
If you want to use bytes (char* in C), then you would use a different character encoding of Unicode (UTF-8). In this encoding, the null byte represents only the character NULL and is never part of a larger character unit.
I hope that helps explain it. You might also glance at my character encoding tutorial  or even order a copy of the Unicode Guide  to help you out.
Globalization Architect -- Lab126
Internationalization is not a feature.
It is an architecture.
From: email@example.com [mailto:firstname.lastname@example.org] On Behalf Of Venugopalan G
Sent: Saturday, June 27, 2009 5:35 AM
Subject: Zero termination
I just want to know if a valid UTF16 string can contain the value zero(0), not the character zero but the 16bit value zero.
Like, if i iterate through each unicode character(16 bits), will i find zero at any time? Is Zero a valid code point or a part of a code point?
Basically can i use zero to represent termination of a U16 string? because if zero is in the middle of str, then the program will terminate in wrong place.
This archive was generated by hypermail 2.1.5 : Sat Jun 27 2009 - 11:34:41 CDT