From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Wed Nov 24 2004 - 16:16:41 CST
At 04:23 PM 11/23/2004, Chris Jacobs wrote:
>Now, this implies that UTF-8 does interpret U+0000 as an ASCII NULL
>control char.
>This is incompatible with using it as a string terminator.
Except that it's up to you how to interpret the C0 control codes in Unicode.
You can do it according to ISO 6429 (where the NUL has a specific significance)
or you can do it according to any other scheme.
The fact is, that that ambiguity of interpreting control codes is very ancient
practice: device control data streams treat these codes differently than they
are treated in plain text files (or in programming language literal data).
Unicode simply recognized this and did not wish to enforce a particular usage.
However, lately, efforts have been under way to make the default behavior of
several of the C0 controls formally match their customary semantics in plain
text data.
I'm not seeing a lot in this thread that adds to the store of knowledge on this
issue, but I see a number of statements that are easily misconstrued or
misapplied,
including the thoroughly discredited practice of storing information in the
high
bit, when piping seven-bit data through eight-bit pathways. The problem
with that
approach, of course, is that the assumption that there were never going to be
8-bit data in these same pipes proved fatally wrong.
A./
This archive was generated by hypermail 2.1.5 : Wed Nov 24 2004 - 16:18:40 CST