From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Tue Nov 23 2004 - 18:21:54 CST
Philippe Verdy écrivit:
> From: "Antoine Leca" <Antoine10646@leca-marti.org>
>> For example, ASCII as designed allowed (please note I did not write
>> "was designed to allow") the use of the 8th bit as parity bit when
>> transmitted as octet on a telecommunication line; I doubt such use is
>> compatible with UTF-8.
> The parity bit is not data; it's a framing bit used for transport/link
> purpose only.
Did I say otherwise?
Even if it is not "data", you can store it inside an octet, along with 7
bits of /data/. You cannot do something similar if you have 8 bits of data,
it won't fit inside the octet. Which was my point.
> ASCII is 7 bit only, so even if a parity bit is added (parity bit can
> be added as well to 8-bit quantities...), it won't be part of the
> effective data, because once the transport unit is received and
> checked, it has to be cleared
Sorry, no: there is no requirement to clear it.
You are assuming something about the way data are handled. When you handle
ASCII data using octets, you can perfectly, and conformantly, keep some
other "data" (being parity or whatever) inside the 8th bit; so with even
parity AT SIGN will be managed as 192, without any kind of problem (for
you). It might even be very convenient to keep this bit as it is, for
example if you know you will have to forward it to another equipment along
some communication line.
In fact, there was (at least a few years ago) some mail gateways that did
exactly that, and I found recently that this hack I used about 25 years ago
was not THAT good.
> By saying UTF-8 is fully compatible with ASCII, it says that any
> ASCII-only encoded file needs no reencoding of its bytes to make it
Looks like a good definition of upper (or backward, as you want) compatible.
I was titling at "fully", particularly since the discussion was picky about
NUL wrt C.
What you are writing is that a 7-bit byte encoded in ASCII is "fully
compatible" with an 8-bit byte encoded in UTF-8... Looks strange to me
written that way, doesn't it?
> Note that this is only true for the US version of ASCII
Anything else would be whateverSCII, but definitively not ASCII, methinks...
> "ASCII" is normally designating only the last standard US variant
Funny. "Last"... You know of /several/ variants?
I do know of several variants of ISO/IEC 646, and even of several variants
of its /reference/ version. And then there is ISO/IEC 2375, and 4873. But
that is another story entirelly.
You were not saying that UTF-8 is fully compatible with *ISO/IEC 646*
instead, were you?
This archive was generated by hypermail 2.1.5 : Tue Nov 23 2004 - 18:11:11 CST