Re: U+0000 in C strings (was: Re: Opinions on this Java URL?)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Nov 15 2004 - 11:32:33 CST

  • Next message: Mark Davis: "Re: U+0000 in C strings"

    From: "Doug Ewell" <dewell@adelphia.net>
    > John Cowan <jcowan at reutershealth dot com> wrote:
    >
    >>> A 32-bit length count, followed by an array of N arbitrary Unicode
    >>> characters, would probably be the best implementation today.
    >>
    >> Which is essentially what the Java String class has, if you unwrap it.
    >
    > Then why do the DataInput and DataOutput interfaces perform this special
    > conversion? There isn't any mention, on the page whose URL Theodore
    > originally provided, of compatibility with C strings. If a Java String
    > consists of a count followed by the data, why would "embedded nulls" in
    > the data make any difference?

    Needed for the class loader, to load the string constants pool within
    compiled classes.

    Needed in the JNI interface to C, which has a legacy 8-bit strings interface
    inherited from old versions of Java (this interface lacks a separate
    string-length indicator and uses null-terminated strings).

    But not needed with the newer JNI interfaces for C where strings are arrays
    of 16bit "char" code units, with a separate explicit 32-bit string length
    indicator (no need to escape nulls).

    Not needed and not used for file or stream I/O, where *true* UTF-8 is
    supported by the "UTF-8"-named Charset instance (which fully complies with
    Unicode definition of UTF-8).



    This archive was generated by hypermail 2.1.5 : Mon Nov 15 2004 - 11:39:13 CST