Re: U+0000 in C strings

From: Mark Davis ([email protected])
Date: Mon Nov 15 2004 - 11:33:17 CST

Next message: Peter Kirk: "Re: U+0000 in C strings"

Previous message: Philippe Verdy: "Re: U+0000 in C strings (was: Re: Opinions on this Java URL?)"
In reply to: Doug Ewell: "Re: U+0000 in C strings"
Next in thread: Peter Kirk: "Re: U+0000 in C strings"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Every few years it seems that this subject blossoms on the list.

Remember that this stuff was done a long time ago. A variant of UTF-8 was
devised by the Java people that would allow them to *losslessly* convert
between String and a representation that C could handle. And losslessly
means that since U+0000 is legal in String, it had to be representable
anywhere in the C string. This was done very early in the development of
Java, even before there was an internationalization group in Javasoft.

The only real problem with this was that they simply called this UTF-8 at
that time. They have since documented, in response to requests by the
Unicode Consortium, that this is a modified, variant UTF-8. It is worked in
too heavily into the structure of Java for them to do much beyond
documenting, and I really haven't heard of real cases where this has caused
a problem.

I doubt that any further discussion of this will be productive.

‎Mark

Next message: Peter Kirk: "Re: U+0000 in C strings"
Previous message: Philippe Verdy: "Re: U+0000 in C strings (was: Re: Opinions on this Java URL?)"
In reply to: Doug Ewell: "Re: U+0000 in C strings"
Next in thread: Peter Kirk: "Re: U+0000 in C strings"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Nov 15 2004 - 11:39:14 CST