Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

From: Tex Texin (
Date: Mon Feb 05 2001 - 15:48:23 EST


I am not clear from your comments which is the bug, since the doc
goes both ways. Are the doc bugs that they say
it is UTF-8, or that they say it is modified UTF-8?

It would be great to learn that the functions are actually unmodified
UTF-8, as I know of some interfaces that are writing non-Java
code and are forced to deal with specialized handling of the modified
It would be great to inform them they can use standard UTF-8 library


John O'Conner wrote:
> Perhaps the methods readUTF and writeUTF should be deprecated in favor of
> read/writeString. I will submit an RFE (request for enhancement) for this.
> I noticed that although the Data{Input,Output} interface clearly says that the
> write/readUTF handles a "Java modified UTF-8". The actual javadoc in DataOutputStream
> says that writeUTF writes the String as UTF-8. Also, the doc for UTFDataFormatException
> is confusing on the issue, saying UTF-8 in one place and "modified UTF-8" in the doc for
> DataInputStream.
> Thats 1 RFE for better method names and 2 bugs in the API documentation! I'll submit all
> 3...if they don't already exist in the db.
> Regards,
> John O'Conner
> John Cowan wrote:
> > The internal encoding is exposed by the regrettably named readUTF and
> > writeUTF methods of{Input,Output}Stream, which should have
> > been named readString and writeString. People have assumed that they
> > are general-purpose UTF-8 read/write functions.
> >

According to Murphy, nothing goes according to Hoyle.
Tex Texin                      Director, International Business      +1-781-280-4271 Fax:+1-781-280-4655
Progress Software Corp.        14 Oak Park, Bedford, MA 01730 #1 Embedded Database

Globalization Program ---------------------------------------------------------------------------

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT