Re: ASCII as a subset of Unicode (was: Re: Oxford proposes a leaner alphabet)

From: Hans Aberg (
Date: Sat Apr 11 2009 - 15:30:11 CDT

    On 11 Apr 2009, at 21:26, Doug Ewell wrote:

    >> I thought ASCII defined its characters as bytes, whereas Unicode
    >> uses code-points which when mapped using UTF-8 will contain the
    >> ASCII as a subset.
    > The *set of characters* in ASCII is a proper and intact subset of
    > Unicode. How these characters are represented inside computer
    > storage and transmission protocols may be defined differently, and
    > doesn't affect my argument that "ASCII characters" and "Unicode
    > characters" are not disjoint sets.

    > Actually, I was under the impression that ASCII was defined in terms
    > of 7-bit code units, whereas there are virtually no computers or
    > users today who think in terms of 7-bit code units.

    Most likely, as in the past, it was common to treat the 8th bit as a
    check bit - it could altered as one pleased in transmission, depending
    on how one set it. This lead to MIME.

    But I think because of this tie to 7-bit bytes, the formally correct
    description is that the there is a defined canonical injection from
    the ASCI character set into the Unicode character set. It is then
    common to identify it with the image.


