From: D. Starner (shalesller@writeme.com)
Date: Sun Nov 14 2004 - 23:43:23 CST
"Philippe Verdy" writes:
> Nulls are legal Unicode characters, also for use in plain text and since
> ever in ASCII, and all ISO 8-bit charset standards. Why do you want that a
> legal Unicode string containing NULL (U+0000) *characters* become illegal
> when converted to C strings?
Why do you need a nul? They're not exactly legal characters in plain text;
I know of no program that would do anything constructive with them in
plain text. A file with arbitrary control characters in it is generally
not a plain text file; an escape code certainly has no fixed meaning and
where it does have meaning it does things, like underlining and highlighting
and other things, that aren't exactly plain text.
> A null *CHARACTER* is valid in C string, because C does not mandate the
> string encoding (which varies according to locale conventions at run-time).
That's specious. The string encoding in C since time immortal has generally
been a variety of ASCII or EBCDIC, both of which make the null character
the null byte.
> Using pure UTF-8 in C strings would not be conforming to either Unicode or C
> conventions because it will illegitimately restrict the legal embedding of
> U+0000 in strings...
That's nothing new; C has restricted the embedding of U+0000 in strings since
the very first compiler. ASCII is no different from UTF-8 here.
I've never seen code to make strings in C that hold nulls; I've never send anybody
use that as a reason that Java or any other language was better than C. The fact
that you can't put NUL in a C string is both true and seemingly moot. Java's
solution to emit it to a C string are creative and probably useful for the situation,
but should never have been written to disk.
-- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm
This archive was generated by hypermail 2.1.5 : Sun Nov 14 2004 - 23:46:15 CST