Re: Opinions on this Java URL?

From: Philippe Verdy (
Date: Sat Nov 13 2004 - 21:41:32 CST

  • Next message: Doug Ewell: "Re: Opinions on this Java URL?"

    From: "Doug Ewell" <>

    > What is a shame is that Unicode published a definition of the defective
    > CESU-8 at all.

    On that point at least we agree. I wonder why CESU-8 was created, if there
    effectively exists applications needing it.

    On the other side, the Java modified UTF-8 (in fact more near from CESU-8)
    has proven to be useful and is widely used... Simply because it is
    compatible with standard C libraries for null-terminated strings. It's
    historic and lived well with Unicode, given the previous tolerance in legacy
    UTF-8 decoders. Even today, it is still conforming with Unicode rules, given
    that Java does not pretend that this is UTF-8 and does not label encoded
    data as being UTF-8 -- it is used internally in Java JNI interfaces or in
    the Java class file format which is not plain-text, and both are part of the
    JVM specifications and not intended for data interchange between distinct
    hosts or applications).

    But the tolerance for non-shortest forms effectively existed, so that C0,80
    would be interpreted safely as NUL (U+0000).

    Another way to think about the Java modified UTF-8 is that it could be a
    transport encoding syntax for CESU-8 (from which it differs mostly by
    escaping null bytes into two bytes C0,80 where the leading byte C0 is not
    used in CESU-8, and by supporting the presence of isolated/unpaired
    surrogates or invalid UTF-16 code units in the CESU-8 scheme-encoded
    string). So why would Sun change something there? Changing something that
    works with a new API that will create incompatibilities does not look like a
    good thing.

    This archive was generated by hypermail 2.1.5 : Sat Nov 13 2004 - 21:43:55 CST