Re: Unicode character transformation through XSLT

From: Markus Scherer (
Date: Fri Mar 14 2003 - 12:40:40 EST

  • Next message: Mark Davis: "Re: New document."

    Nooo - Java's old "UTF" functions do not process UTF-8! They are there for String serialization, a
    Java-internal format.
    Use the Java Reader/Writer classes instead of these old ones!

    See the Java tutorials on Internationalization:

    See the descriptions of readUTF() functions (highlighting with ***):

    "Reads from the stream in a representation of a Unicode character string encoded in ***Java modified
    UTF-8*** format; this string of characters is then returned as a String. The details of the
    ***modified UTF-8*** representation are exactly the same as for the readUTF method of DataInput."

    Java's *modified* UTF-8 in its "UTF" functions resembles CESU-8, and writes U+0000 with two bytes
    instead of one, as far as I remember.


    Yung-Fong Tang wrote:
    > what is rsResult? Blob?
    > you probably need to use
    > BufferedInputStream
    > and
    > DataInputStream
    > to pipe the InputStream
    > and use readChar or readUTF in the InputStream interface instad.
    > See and
    > for more info.

    Opinions expressed here may not reflect my company's positions unless otherwise noted.

    This archive was generated by hypermail 2.1.5 : Fri Mar 14 2003 - 13:35:55 EST