Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

From: Tex Texin (texin@progress.com)
Date: Mon Feb 05 2001 - 12:58:17 EST


John,
It does impact developers.

The API for DataInputStream defines FSS_UTF, which includes the funky
null behavior.

http://java.sun.com/products/jdk/1.2/docs/api/java/io/DataInputStream.html

Since this API and other use this UTF, it gets into file formats and
applications
end up supporting it....

tex

John O'Conner wrote:
>
> Within a String, the encoding of char values is practically irrelevant. It is a
> hidden encoding that is never exposed to the user...or developer. When you access
> String char values, you use an index to 16-bit Unicode values. To my knowledge,
> Sun does not claim that its internal encoding of String is UTF-8 in any of its API
> documentation.
>
> Any component or converter that claims to produce a UTF-8 encoding should not
> behave as you describe. For example, Java's UTF-8 converter does not encode U+0000
> as 0xC0 0x80. If it ever does, please file a bug.
>
> Regards,
> John O'Conner
>
> DougEwell2@cs.com wrote:
>
> > This is laziness, intended to get around the "problem" of supplementary code
> > points instead of handling them like any other code points. This reminds me
> > of the Java bastardization of UTF-8, in which U+0000 is encoded 0xC0 0x80 so
> > that no character string will ever contain the byte 0x00. (Nobody has ever
> > explained to me why a character string would contain U+0000 in the first
> > place.)

-- 
According to Murphy, nothing goes according to Hoyle.
--------------------------------------------------------------------------
Tex Texin                      Director, International Business
mailto:Texin@Progress.com      +1-781-280-4271 Fax:+1-781-280-4655
Progress Software Corp.        14 Oak Park, Bedford, MA 01730

http://www.Progress.com #1 Embedded Database

Globalization Program http://www.Progress.com/partners/globalization.htm ---------------------------------------------------------------------------



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT