Re: U+0000 in C strings

From: Mark Davis (
Date: Mon Nov 15 2004 - 11:33:17 CST

  • Next message: Peter Kirk: "Re: U+0000 in C strings"

    Every few years it seems that this subject blossoms on the list.

    Remember that this stuff was done a long time ago. A variant of UTF-8 was
    devised by the Java people that would allow them to *losslessly* convert
    between String and a representation that C could handle. And losslessly
    means that since U+0000 is legal in String, it had to be representable
    anywhere in the C string. This was done very early in the development of
    Java, even before there was an internationalization group in Javasoft.

    The only real problem with this was that they simply called this UTF-8 at
    that time. They have since documented, in response to requests by the
    Unicode Consortium, that this is a modified, variant UTF-8. It is worked in
    too heavily into the structure of Java for them to do much beyond
    documenting, and I really haven't heard of real cases where this has caused
    a problem.

    I doubt that any further discussion of this will be productive.


    This archive was generated by hypermail 2.1.5 : Mon Nov 15 2004 - 11:39:14 CST