Re: UTF-8 as character set (was: Java and UTF)

From: Glen Perkins (gperkins@netcom.com)
Date: Fri Jul 04 1997 - 14:50:36 EDT

Next message: Jonathan Rosenne: "Re: MES instead of ISO 8859-nn"
Previous message: Pierre Lewis: "re:UTF-8 as character set (was: Java and UTF)"
Maybe in reply to: Kenneth Whistler: "UTF-8 as character set (was: Java and UTF)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Pierre Lewis <lew@nortel.ca> wrote:
>
>
> Thanks for the very useful clarifications (including on terminology).
> I knew about these new Java classes that allowed to convert between
> various encodings (eg. CP850 <--> Unicode), but, because I had an
> algorithmic view of what UTF-8 meant, it never occured to me to
> search in there (and since the book doesn't have the table of
> supported encodings, it didn't jump me in the face either).
>

For the time being, it's better to search the source for that
information. The converter system is still a bit rough and the docs
don't cover it very well yet.

> Still, one more question. What exception would InputStreamReader
> throw on getting non-standard (eg. language-tagged :-) ) UTF-8?
> UTFDataFormatException? My book associates this exception only
> with DataInput I/F. Another source of my confusion.
>

I'm not sure what is *supposed* to happen eventually, but for right now,
it throws an InternalError and requests that you report it to JavaSoft.
I don't think they are going to want to keep it this way long term.
Conversion errors can certainly be caused by bad data, not just
converter bugs. The programmer should be able to get some feedback in
that case.

Maybe Mark Davis could comment on the ultimate goal for converter
behavior in the case of bad input data. For now, here's the method
that's called when you hit an illegal byte sequence:

======================================
    private void malfunction() {
        throw new InternalError("Converter malfunction (" +
                                btc.getCharacterEncoding() +
                                ") -- please send a bug report to" +
                                " java-io@java.sun.com");
    }
=======================================

The 'btc.getCharacterEncoding()' returns a String such as "UTF8"
containing the name of the byte-to-char converter that "malfunctioned".

__Glen Perkins__
glen.perkins@nativeguide.com

Next message: Jonathan Rosenne: "Re: MES instead of ISO 8859-nn"
Previous message: Pierre Lewis: "re:UTF-8 as character set (was: Java and UTF)"
Maybe in reply to: Kenneth Whistler: "UTF-8 as character set (was: Java and UTF)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT