Re: Strange UTF-8 in Java

From: John Cowan (
Date: Wed Sep 30 1998 - 12:34:06 EDT

Mark Davis scripsit:

> 2. For serializing strings internally, they use a byte format which is the same
> UTF-8, except that they use two bytes for null (<C0, 80>). The standard algorithm
> for converting UTF-8 to Unicode will convert this correctly back to a null,
> unless special checks are made for shortest forms.

IMHO, the only real blunder the Javasoft folks made in this respect
was in the names of the methods DataInput.readUTF() and DataOutput.writeUTF(),
which suggest that these are general-purpose UTF-8 transput methods.
As I have said, they are meant to transput Java Strings in binary
contexts, include a length value, and should have been called
DataInput.readString() and DataOutput.writeString().

John Cowan
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT