Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

From: Martin J. Dürst (
Date: Wed Nov 10 2010 - 21:54:57 CST

  • Next message: Bjoern Hoehrmann: "Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?"

    On 2010/11/11 6:28, Mark Davis ☕ wrote:

    > That is actually not the case. There are superset relations among some of
    > the CJK character sets, and also -- practically speaking -- between some of
    > the windows and ISO-8859 sets. I say practically speaking because in general
    > environments, the C1 controls are really unused, so where a non ISO-8859 set
    > is same except for 80..9F you can treat it pragmatically as a superset.

    Yes, except that the terms superset/subset (and set in general)
    shouldn't be used unless you really strictly speak about the repertoire
    of characters, and not the encoding itself. So e.g. the repertoire of
    iso-8859-1 is a subset of the repertoire of UTF-8. However, iso-8859-1
    is not a subset of UTF-8, not because you can't label some text encoded
    as iso-8859-1, but because subset relationships among the encodings
    themselves don't make sense).
    Also, US-ASCII is not a subset of UTF-8, because when you just use the
    names of the character encodings, you mean the character encodings, and
    character encodings don't have subset relationships.

    It may as well be possible to use (create?) the term sub-encoding,
    saying that an encoding A is a sub-encoding of encoding B if all (legal)
    byte sequences in encoding A are also legal byte sequences in encoding B
    and are interpreted as the same characters in both cases. In this sense,
    US-ASCII is clearly a sub-encoding of UTF-8, as well as a sub-encoding
    of many other encodings. You can also say that iso-8859-1 is a
    sub-encoding of windows-1252 if the former is interpreted as not
    including the C1 range.

    Regards, Martin.

    #-# Martin J. Dürst, Professor, Aoyama Gakuin University

    This archive was generated by hypermail 2.1.5 : Wed Nov 10 2010 - 21:59:24 CST