Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

From: John Cowan (jcowan@reutershealth.com)
Date: Mon Feb 05 2001 - 12:56:21 EST


John O'Conner wrote:

> Within a String, the encoding of char values is practically irrelevant. It is a
> hidden encoding that is never exposed to the user...or developer. When you access
> String char values, you use an index to 16-bit Unicode values. To my knowledge,
> Sun does not claim that its internal encoding of String is UTF-8 in any of its API
> documentation.

The internal encoding is exposed by the regrettably named readUTF and
writeUTF methods of java.io.Data{Input,Output}Stream, which should have
been named readString and writeString. People have assumed that they
are general-purpose UTF-8 read/write functions.

At one point, this was a FAQ on this list.

-- 
There is / one art             || John Cowan <jcowan@reutershealth.com>
no more / no less              || http://www.reutershealth.com
to do / all things             || http://www.ccil.org/~cowan
with art- / lessness           \\ -- Piet Hein



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT