Re: Strange UTF-8 in Java

From: John Cowan (cowan@locke.ccil.org)
Date: Wed Sep 30 1998 - 12:34:06 EDT

Next message: Roman Czyborra: "Re: UTF16 <=> Reuters format?"
Previous message: Mark Davis: "Re: Strange UTF-8 in Java"
Maybe in reply to: Elliotte Rusty Harold: "Strange UTF-8 in Java"
Next in thread: Rick McGowan: "Re: Strange UTF-8 in Java"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Mark Davis scripsit:

> 2. For serializing strings internally, they use a byte format which is the same
> UTF-8, except that they use two bytes for null (<C0, 80>). The standard algorithm
> for converting UTF-8 to Unicode will convert this correctly back to a null,
> unless special checks are made for shortest forms.

IMHO, the only real blunder the Javasoft folks made in this respect
was in the names of the methods DataInput.readUTF() and DataOutput.writeUTF(),
which suggest that these are general-purpose UTF-8 transput methods.
As I have said, they are meant to transput Java Strings in binary
contexts, include a length value, and should have been called
DataInput.readString() and DataOutput.writeString().

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

Next message: Roman Czyborra: "Re: UTF16 <=> Reuters format?"
Previous message: Mark Davis: "Re: Strange UTF-8 in Java"
Maybe in reply to: Elliotte Rusty Harold: "Strange UTF-8 in Java"
Next in thread: Rick McGowan: "Re: Strange UTF-8 in Java"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT