Re: CodePage Information

From: Doug Ewell (dewell@adelphia.net)
Date: Fri May 23 2003 - 00:05:16 EDT

Next message: Asmus Freytag: "Re: Is it true that Unicode is insufficient for Oriental languages?"

Previous message: Kenneth Whistler: "Re: Is it true that Unicode is insufficient for Oriental languages?"
In reply to: Philippe Verdy: "Re: CodePage Information"
Next in thread: Kenneth Whistler: "Re: CodePage Information"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy <verdy_p at wanadoo dot fr> wrote:

> The main reason why the 0x00 byte causes problems is because it is
> most often used as a string terminator, unlike what ASCII or Unicode
> defines for the NULL character. In this case, one cannot encode it
> because the device or protocoldoes not support sending a separate
> length specifier and needs the 0x00 to terminate the string, and thus
> a NULL character in a Unicode string could not be encoded even if it's
> needed.

Everything Ken said about the advisability, and the past and present
permissibility, of using non-shortest UTF-8 is true.

I'd like to ask a different question, one that steps away from Unicode
for a minute and addresses the broader concept of text storage and
processing:

    What real-world situations call for a NULL character to be stored
    as part of a text string, in conflict with its use in the C
    language (etc.) as a string terminator?

Basically you are making the claim that 0x00 might be used not only as a
string terminator (not part of the string per se) but also for some
other purpose WITHIN the string, so that the two uses of 0x00 need to be
distinguished. But what other uses of 0x00 are there within a string?
I can't think of any.

There's a reason why neither Unicode nor any other coded character set
(including the ISO 2022 mechanism) assigns a specific function to 0x00.
It is too valuable in its role as a NULL character.

Of course, an arbitrary binary stream might well contain 0x00 bytes, but
then it would not be appropriate, for a variety of reasons, to attempt
to perform text processing functions on such a stream.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: Asmus Freytag: "Re: Is it true that Unicode is insufficient for Oriental languages?"
Previous message: Kenneth Whistler: "Re: Is it true that Unicode is insufficient for Oriental languages?"
In reply to: Philippe Verdy: "Re: CodePage Information"
Next in thread: Kenneth Whistler: "Re: CodePage Information"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri May 23 2003 - 00:50:54 EDT