From: Johannes Rössel (joey@muhkuhsaft.de)
Date: Thu Nov 11 2010 - 05:08:24 CST
Hello Martin,
On 2010-11-11 04:54, "Martin J. Dürst" wrote:
> Yes, except that the terms superset/subset (and set in general)
> shouldn't be used unless you really strictly speak about the repertoire
> of characters, and not the encoding itself. So e.g. the repertoire of
> iso-8859-1 is a subset of the repertoire of UTF-8. However, iso-8859-1
> is not a subset of UTF-8, not because you can't label some text encoded
> as iso-8859-1, but because subset relationships among the encodings
> themselves don't make sense).
if you model encodings as functions, thereby making ASCII something like
ASCII ≔ { 0 ↦ '\0', ..., 32 ↦ ' ', 33 ↦ '!', 34 ↦ '"', ..., 126 ↦
'~', 127 ↦ '' }
you can definitely use the words subset and superset. Since this is just
a set of tuples that may be contained idendically in other encodings
(such as UTF-8), it is appropriate to say that ASCII is a subset of
UTF-8. Of course, restricting this to the range of the function, i.e.
ran ASCII = {'\0', ..., ' ', '!', ..., '~', '' }
(sorry, borrowing some syntax from Z) allows you to make repertoire
comparisons in a sub/superset manner, making ran Latin9 a subset of ran
Unicode, even though the respective functions don't share this relationship.
Just a thought :-)
Regards,
Johannes
This archive was generated by hypermail 2.1.5 : Thu Nov 11 2010 - 05:13:08 CST