Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

From: Doug Ewell (
Date: Thu Nov 11 2010 - 11:26:08 CST

  • Next message: Frank da Cruz: "Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?"

    Mark Davis 😎 wrote:

    > There are superset relations among some of the CJK character sets, and
    > also -- practically speaking -- between some of the windows and
    > ISO-8859 sets. I say practically speaking because in general
    > environments, the C1 controls are really unused, so where a non
    > ISO-8859 set is same except for 80..9F you can treat it pragmatically
    > as a superset.

    There was a time, about 10 years ago, when Frank da Cruz would have
    replied almost immediately about the importance of C1 controls in
    terminal environments, and the arguments about incompatibility between
    8859-1 and Windows-1252 would have been off and running.

    That was about the same time that people like Roman Czyborra were
    complaining that their terminals were scrambled by text encoded in
    UTF-8, because of its use of bytes in the 80..9F range, and people like
    Jörg Knappen were creating alternative UTF's to get around this
    perceived problem.

    Regarding the subset/superset terminology, we need to distinguish
    between "encoding subsets" and "repertoire subsets":

    * ASCII is both an encoding subset and a repertoire subset of 8859-1 and
    Windows-1252 and UTF-8.

    * 8859-1 is an encoding subset of Windows-1252, except for the 80..9F

    * 8859-1 and Windows-1252 are repertoire subsets, but not encoding
    subsets, of UTF-8.

    * 8859-15 is neither type of subset of 8859-1.

    * Etc.

    Doug Ewell | Thornton, Colorado, USA |
    RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­

    This archive was generated by hypermail 2.1.5 : Thu Nov 11 2010 - 11:31:36 CST