Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

From: Tex Texin (
Date: Mon Feb 05 2001 - 12:58:17 EST

It does impact developers.

The API for DataInputStream defines FSS_UTF, which includes the funky
null behavior.

Since this API and other use this UTF, it gets into file formats and
end up supporting it....


John O'Conner wrote:
> Within a String, the encoding of char values is practically irrelevant. It is a
> hidden encoding that is never exposed to the user...or developer. When you access
> String char values, you use an index to 16-bit Unicode values. To my knowledge,
> Sun does not claim that its internal encoding of String is UTF-8 in any of its API
> documentation.
> Any component or converter that claims to produce a UTF-8 encoding should not
> behave as you describe. For example, Java's UTF-8 converter does not encode U+0000
> as 0xC0 0x80. If it ever does, please file a bug.
> Regards,
> John O'Conner
> wrote:
> > This is laziness, intended to get around the "problem" of supplementary code
> > points instead of handling them like any other code points. This reminds me
> > of the Java bastardization of UTF-8, in which U+0000 is encoded 0xC0 0x80 so
> > that no character string will ever contain the byte 0x00. (Nobody has ever
> > explained to me why a character string would contain U+0000 in the first
> > place.)

According to Murphy, nothing goes according to Hoyle.
Tex Texin                      Director, International Business      +1-781-280-4271 Fax:+1-781-280-4655
Progress Software Corp.        14 Oak Park, Bedford, MA 01730 #1 Embedded Database

Globalization Program ---------------------------------------------------------------------------

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT