Re: Zero termination

From: Doug Ewell (
Date: Sat Jun 27 2009 - 11:58:10 CDT

  • Next message: Venugopalan G: "Re: Zero termination"

    John (Eljay) Love-Jensen <eljay at adobe dot com> replied to Venugopalan

    >> Like, if i iterate through each unicode character(16 bits), will i
    >> find zero at any time?
    > It is possible.
    >> Basically can i use zero to represent termination of a U16 string?
    > If your *OWN* encoding reserves 0x0000 for UTF-16 termination, and
    > what you are encoding itself does not have U+0000 code points in it,
    > then using 0x0000 as your own string termination instead of
    > representing a UTF-16 code point is a reasonable compromise.
    > But if what you are trying to represent is any valid Unicode sequence,
    > then U+0000 would be a valid Unicode code point which your string
    > class could not contain. Unless you re-encode U+0000 as something
    > else... but then your string class would not be UTF-16, it would be
    > something close-to-but-not-quite UTF-16, and could not stream in
    > "pure" UTF-16 without translation UTF-16 into your close-but-not-quite
    > UTF-16.

    To clarify slightly, the problem is no different for Unicode from what
    it is for ASCII or ISO 8859. ASCII itself does not prohibit 0x00 as
    part of a string, because the definition of "string" is outside its
    scope. Likewise, Unicode does not prohibit U+0000. However, most
    modern protocols do treat "null" as invalid within a string, usually in
    the role of string terminator.

    Strings that did not contain 0x00 in an 8-bit character set will not
    contain U+0000 when converted to Unicode.

    If you want to process *any arbitrary sequence of Unicode characters* as
    a string, then you may have problems with U+0000 -- but that would have
    been true if you wanted to process any arbitrary sequence of bytes as an
    ASCII string.

    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14  ˆ

    This archive was generated by hypermail 2.1.5 : Sat Jun 27 2009 - 12:01:22 CDT