From: Hans Aberg (firstname.lastname@example.org)
Date: Sat Apr 11 2009 - 15:30:11 CDT
On 11 Apr 2009, at 21:26, Doug Ewell wrote:
>> I thought ASCII defined its characters as bytes, whereas Unicode
>> uses code-points which when mapped using UTF-8 will contain the
>> ASCII as a subset.
> The *set of characters* in ASCII is a proper and intact subset of
> Unicode. How these characters are represented inside computer
> storage and transmission protocols may be defined differently, and
> doesn't affect my argument that "ASCII characters" and "Unicode
> characters" are not disjoint sets.
> Actually, I was under the impression that ASCII was defined in terms
> of 7-bit code units, whereas there are virtually no computers or
> users today who think in terms of 7-bit code units.
Most likely, as in the past, it was common to treat the 8th bit as a
check bit - it could altered as one pleased in transmission, depending
on how one set it. This lead to MIME.
But I think because of this tie to 7-bit bytes, the formally correct
description is that the there is a defined canonical injection from
the ASCI character set into the Unicode character set. It is then
common to identify it with the image.
This archive was generated by hypermail 2.1.5 : Sat Apr 11 2009 - 15:34:20 CDT