From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Nov 15 2004 - 11:32:33 CST
From: "Doug Ewell" <dewell@adelphia.net>
> John Cowan <jcowan at reutershealth dot com> wrote:
>
>>> A 32-bit length count, followed by an array of N arbitrary Unicode
>>> characters, would probably be the best implementation today.
>>
>> Which is essentially what the Java String class has, if you unwrap it.
>
> Then why do the DataInput and DataOutput interfaces perform this special
> conversion? There isn't any mention, on the page whose URL Theodore
> originally provided, of compatibility with C strings. If a Java String
> consists of a count followed by the data, why would "embedded nulls" in
> the data make any difference?
Needed for the class loader, to load the string constants pool within
compiled classes.
Needed in the JNI interface to C, which has a legacy 8-bit strings interface
inherited from old versions of Java (this interface lacks a separate
string-length indicator and uses null-terminated strings).
But not needed with the newer JNI interfaces for C where strings are arrays
of 16bit "char" code units, with a separate explicit 32-bit string length
indicator (no need to escape nulls).
Not needed and not used for file or stream I/O, where *true* UTF-8 is
supported by the "UTF-8"-named Charset instance (which fully complies with
Unicode definition of UTF-8).
This archive was generated by hypermail 2.1.5 : Mon Nov 15 2004 - 11:39:13 CST