From: Doug Ewell (dewell@adelphia.net)
Date: Mon Nov 15 2004 - 00:46:31 CST
John Cowan <jcowan at reutershealth dot com> wrote:
> Most languages other than C define a string as a sequence of
> characters rather than a sequence of non-null characters. The
> repertoire of characters than can exist in strings usually has a lower
> bound, but its full magnitude is implementation-specific. In Java,
> exceptionally, the repertoire is defined by the standard rather than
> the implementation, and it includes U+0000. In any case, I can think
> of no language other than C which does not support strings containing
> U+0000 in most implementations.
In Pascal, which I learned before C, strings were implemented as a count
of characters followed by the characters themselves. Unfortunately, the
count was a single byte, and the resulting maximum string length of 255
was a much greater inconvenience in real life than C's prohibition
against a string containing 0x00. I don't know if modern Pascal
implementations are the same way.
A 32-bit length count, followed by an array of N arbitrary Unicode
characters, would probably be the best implementation today.
I'd still like to know what practical, real-world TEXT-related benefits
would derive from allowing U+0000 in strings of TEXT in a C program.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Mon Nov 15 2004 - 00:48:45 CST