Re: U+0000 in C strings (was: Re: Opinions on this Java URL?)

From: Doug Ewell (
Date: Mon Nov 15 2004 - 00:46:31 CST

  • Next message: Doug Ewell: "Re: Opinions on this Java URL?"

    John Cowan <jcowan at reutershealth dot com> wrote:

    > Most languages other than C define a string as a sequence of
    > characters rather than a sequence of non-null characters. The
    > repertoire of characters than can exist in strings usually has a lower
    > bound, but its full magnitude is implementation-specific. In Java,
    > exceptionally, the repertoire is defined by the standard rather than
    > the implementation, and it includes U+0000. In any case, I can think
    > of no language other than C which does not support strings containing
    > U+0000 in most implementations.

    In Pascal, which I learned before C, strings were implemented as a count
    of characters followed by the characters themselves. Unfortunately, the
    count was a single byte, and the resulting maximum string length of 255
    was a much greater inconvenience in real life than C's prohibition
    against a string containing 0x00. I don't know if modern Pascal
    implementations are the same way.

    A 32-bit length count, followed by an array of N arbitrary Unicode
    characters, would probably be the best implementation today.

    I'd still like to know what practical, real-world TEXT-related benefits
    would derive from allowing U+0000 in strings of TEXT in a C program.

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Mon Nov 15 2004 - 00:48:45 CST