Re: U+0000 in C strings (was: Re: Opinions on this Java URL?)

From: John Cowan (
Date: Mon Nov 15 2004 - 10:41:13 CST

  • Next message: Philippe Verdy: "Re: U+0000 in C strings (was: Re: Opinions on this Java URL?)"

    Doug Ewell scripsit:

    > Then why do the DataInput and DataOutput interfaces perform this special
    > conversion? There isn't any mention, on the page whose URL Theodore
    > originally provided, of compatibility with C strings.

    Probably because Sun was reusing the format that string literals take in
    compiled Java classes. The format is as compact as UTF-8 provided your
    characters are in the range U+0001 to U+FFFF, which is true most of the time.
    Serializing with a 32-bit length would be much bulkier.

    > If a Java String consists of a count followed by the data,

    I didn't say that. A Java String in memory contains a count and the data,
    because it is basically a wrapper around a Java array of characters, and Java
    arrays contain a count. (Strings, unlike arrays, are immutable in Java.)
    That does not mean that the count is "followed by" the data in the memory
    representation, which indeed is up to the JVM -- Java does not prescribe it.

    > Those are design benefits. I was asking about the ability to represent
    > text adequately.

    Strings are not used solely to represent text; they are more general.

    John Cowan
    Consider the matter of Analytic Philosophy.  Dennett and Bennett are well-known.
    Dennett rarely or never cites Bennett, so Bennett rarely or never cites Dennett.
    There is also one Dummett.  By their works shall ye know them.  However, just as
    no trinities have fourth persons (Zeppo Marx notwithstanding), Bummett is hardly
    known by his works.  Indeed, Bummett does not exist.  It is part of the function
    of this and other e-mail messages, therefore, to do what they can to create him.

    This archive was generated by hypermail 2.1.5 : Mon Nov 15 2004 - 10:47:22 CST