Re: What does it mean to "not be a valid string in Unicode"?

From: Markus Scherer <>
Date: Fri, 4 Jan 2013 19:23:16 -0800

On Fri, Jan 4, 2013 at 6:08 PM, Stephan Stiller

> Is there a most general sense in which there are constraints beyond all
> characters being from within the range U+0000 ... U+10FFFF? If one is
> concerned with computer security, oddities that are absolute should raise a
> flag; somebody could be messing with my system.

If you are concerned with computer security, then I suggest you read "Unicode Security Considerations".

For example, the original C datatype named "string", as it is understood
> and manipulated by the C standard library, has an *absolute* prohibition
> against U+0000 anywhere inside.

That's not as much a prohibition as an artifact of NUL-termination of
strings. In more modern libraries, the string contents and its explicit
length are stored together, and you can store a 00 byte just fine, for
example in a C++ string.

Received on Fri Jan 04 2013 - 21:26:59 CST

This archive was generated by hypermail 2.2.0 : Fri Jan 04 2013 - 21:27:05 CST