Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

From: John O'Conner (
Date: Mon Feb 05 2001 - 19:00:04 EST

Here's what I see about the Java API docs:
1. The Data{Input, Output}Stream methods {read, write}UTF could be named better. More
appropriate names are {read, write}String. Strictly speaking, this is not a bug, but it could
be better. That's why I call it an RFE (request for enhancement).
2. DataOutputStream's writeUTF() method says it writes UTF-8, when clearly this is a
"modified" UTF-8. The implementation is fine...the documentation is incorrect since it doesn't
write UTF-8 but something slightly different.
3. DataInputStream's readUTF() method is clear that it reads a "modified" UTF-8, but the doc
also says it can throw an UnsupportedDataFormatException if the input stream isn't valid
UTF-8. The error is that it says UTF-8, not "modified" UTF-8 or FSS_UTF.

John O'Conner

Tex Texin wrote:

> John,
> I am not clear from your comments which is the bug, since the doc
> goes both ways. Are the doc bugs that they say
> it is UTF-8, or that they say it is modified UTF-8?
> It would be great to learn that the functions are actually unmodified
> UTF-8, as I know of some interfaces that are writing non-Java
> code and are forced to deal with specialized handling of the modified
> UTF-8.
> It would be great to inform them they can use standard UTF-8 library
> routines.
> tex

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT