Sadly, the readUTF/writeUTF methods are NOT the way to access
UTF-8 in Java. They are for sending serialized Strings to other Java
processes. This is documented by Sun (but very poorly): I bookmarked the
page on my other machine (the one in California, while I'm out of
town) because it was a surprise to me.
If you want UTF-8, it's an encoding: use a converter like any other
I'm not sure this is a bug, incidentally, because it means that all "I am
not a String" encodings are handled identically. The names are horrid and
the doc useless, tho'.
Addison P. Phillips Globalization Architect
email@example.com B2B Software Integration
+1 408.210.3569 (mobile) +1 408.904.4762 (fax)
On Mon, 5 Feb 2001, John Cowan wrote:
> Tex Texin wrote:
> > I am not clear from your comments which is the bug, since the doc
> > goes both ways. Are the doc bugs that they say
> > it is UTF-8, or that they say it is modified UTF-8?
> It uses modified UTF-8, modified in three ways:
> 1) U+0000 is encoded in two bytes as 0xc0 0x80;
> 2) values above U+FFFF are encoded in six bytes as the UTF-8 encoding
> of their UTF-16 equivalent form;
> 3) the whole string is prefixed with a byte count represented
> as a 2-byte big-endian binary integer.
> > It would be great to learn that the functions are actually unmodified
> > UTF-8, as I know of some interfaces that are writing non-Java
> > code and are forced to deal with specialized handling of the modified
> > UTF-8.
> > It would be great to inform them they can use standard UTF-8 library
> > routines.
> *chomp* No such luck Doc!
> There is / one art || John Cowan <firstname.lastname@example.org>
> no more / no less || http://www.reutershealth.com
> to do / all things || http://www.ccil.org/~cowan
> with art- / lessness \\ -- Piet Hein
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT