Re: What does it mean to "not be a valid string in Unicode"?

From: Markus Scherer <markus.icu_at_gmail.com>
Date: Fri, 4 Jan 2013 19:23:16 -0800

On Fri, Jan 4, 2013 at 6:08 PM, Stephan Stiller
<stephan.stiller_at_gmail.com>wrote:

> Is there a most general sense in which there are constraints beyond all
> characters being from within the range U+0000 ... U+10FFFF? If one is
> concerned with computer security, oddities that are absolute should raise a
> flag; somebody could be messing with my system.
>

If you are concerned with computer security, then I suggest you read
http://www.unicode.org/reports/tr36/ "Unicode Security Considerations".

For example, the original C datatype named "string", as it is understood
> and manipulated by the C standard library, has an *absolute* prohibition
> against U+0000 anywhere inside.
>

That's not as much a prohibition as an artifact of NUL-termination of
strings. In more modern libraries, the string contents and its explicit
length are stored together, and you can store a 00 byte just fine, for
example in a C++ string.

markus
Received on Fri Jan 04 2013 - 21:26:59 CST

This archive was generated by hypermail 2.2.0 : Fri Jan 04 2013 - 21:27:05 CST