Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

From: Johannes Rössel (joey@muhkuhsaft.de)
Date: Thu Nov 11 2010 - 05:08:24 CST

  • Next message: Johannes Rössel: "Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?"

    Hello Martin,

    On 2010-11-11 04:54, "Martin J. Dürst" wrote:
    > Yes, except that the terms superset/subset (and set in general)
    > shouldn't be used unless you really strictly speak about the repertoire
    > of characters, and not the encoding itself. So e.g. the repertoire of
    > iso-8859-1 is a subset of the repertoire of UTF-8. However, iso-8859-1
    > is not a subset of UTF-8, not because you can't label some text encoded
    > as iso-8859-1, but because subset relationships among the encodings
    > themselves don't make sense).

    if you model encodings as functions, thereby making ASCII something like

         ASCII ≔ { 0 ↦ '\0', ..., 32 ↦ ' ', 33 ↦ '!', 34 ↦ '"', ..., 126 ↦
    '~', 127 ↦ '' }

    you can definitely use the words subset and superset. Since this is just
    a set of tuples that may be contained idendically in other encodings
    (such as UTF-8), it is appropriate to say that ASCII is a subset of
    UTF-8. Of course, restricting this to the range of the function, i.e.

         ran ASCII = {'\0', ..., ' ', '!', ..., '~', '' }

    (sorry, borrowing some syntax from Z) allows you to make repertoire
    comparisons in a sub/superset manner, making ran Latin9 a subset of ran
    Unicode, even though the respective functions don't share this relationship.

    Just a thought :-)

    Regards,
    Johannes



    This archive was generated by hypermail 2.1.5 : Thu Nov 11 2010 - 05:13:08 CST