From: Andrew Lipscomb (ewwa@chattanooga.net)
Date: Mon Jun 29 2009 - 09:46:03 CDT
>> Hi grp,
> I just want to know if a valid UTF16 string can contain the value
> zero(0),
> not the character zero but the 16bit value zero.
> Like, if i iterate through each unicode character(16 bits), will i
> find zero
> at any time? Is Zero a valid code point or a part of a code point?
> Basically can i use zero to represent termination of a U16 string?
> because
> if zero is in the middle of str, then the program will terminate
> in wrong
> place.
Unicode is not directly concerned with the validity of strings as
such. A single byte containing the hex value 00 will occur
frequently within typical UTF-16 text (most notably for all
characters in ISO-Latin-1, and for two common characters--the
space and the ideogram for "one"--in Chinese). However, a string
terminated by a pair of 00 bytes (aligned to the character
boundary), in other words a NULL represented in its UTF-16 format,
is not unheard of as a terminator--Windows, at least, uses that
for its "wide character" string format.
This archive was generated by hypermail 2.1.5 : Mon Jun 29 2009 - 09:50:26 CDT