Re: My Querry

From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Tue Nov 23 2004 - 18:21:54 CST

  • Next message: Chris Jacobs: "Re: My Querry"

    Philippe Verdy écrivit:

    > From: "Antoine Leca" <Antoine10646@leca-marti.org>
    >> For example, ASCII as designed allowed (please note I did not write
    >> "was designed to allow") the use of the 8th bit as parity bit when
    >> transmitted as octet on a telecommunication line; I doubt such use is
    >> compatible with UTF-8.
    >
    > The parity bit is not data; it's a framing bit used for transport/link
    > purpose only.

    Did I say otherwise?
    Even if it is not "data", you can store it inside an octet, along with 7
    bits of /data/. You cannot do something similar if you have 8 bits of data,
    it won't fit inside the octet. Which was my point.

    > ASCII is 7 bit only, so even if a parity bit is added (parity bit can
    > be added as well to 8-bit quantities...), it won't be part of the
    > effective data, because once the transport unit is received and
    > checked, it has to be cleared

    Sorry, no: there is no requirement to clear it.
    You are assuming something about the way data are handled. When you handle
    ASCII data using octets, you can perfectly, and conformantly, keep some
    other "data" (being parity or whatever) inside the 8th bit; so with even
    parity AT SIGN will be managed as 192, without any kind of problem (for
    you). It might even be very convenient to keep this bit as it is, for
    example if you know you will have to forward it to another equipment along
    some communication line.

    In fact, there was (at least a few years ago) some mail gateways that did
    exactly that, and I found recently that this hack I used about 25 years ago
    was not THAT good.
    ;-)

    > By saying UTF-8 is fully compatible with ASCII, it says that any
    > ASCII-only encoded file needs no reencoding of its bytes to make it
    > UTF-8.

    Looks like a good definition of upper (or backward, as you want) compatible.
    I was titling at "fully", particularly since the discussion was picky about
    NUL wrt C.

    What you are writing is that a 7-bit byte encoded in ASCII is "fully
    compatible" with an 8-bit byte encoded in UTF-8... Looks strange to me
    written that way, doesn't it?

    > Note that this is only true for the US version of ASCII

    Anything else would be whateverSCII, but definitively not ASCII, methinks...

    > "ASCII" is normally designating only the last standard US variant

    Funny. "Last"... You know of /several/ variants?
    I do know of several variants of ISO/IEC 646, and even of several variants
    of its /reference/ version. And then there is ISO/IEC 2375, and 4873. But
    that is another story entirelly.

    You were not saying that UTF-8 is fully compatible with *ISO/IEC 646*
    instead, were you?

    Antoine



    This archive was generated by hypermail 2.1.5 : Tue Nov 23 2004 - 18:11:11 CST