From: Johannes Rössel (joey@muhkuhsaft.de)
Date: Thu Nov 11 2010 - 05:08:24 CST
Hello Martin,
On 2010-11-11 04:54, "Martin J. Dürst" wrote:
 > Yes, except that the terms superset/subset (and set in general)
 > shouldn't be used unless you really strictly speak about the repertoire
 > of characters, and not the encoding itself. So e.g. the repertoire of
 > iso-8859-1 is a subset of the repertoire of UTF-8. However, iso-8859-1
 > is not a subset of UTF-8, not because you can't label some text encoded
 > as iso-8859-1, but because subset relationships among the encodings
 > themselves don't make sense).
if you model encodings as functions, thereby making ASCII something like
     ASCII ≔ { 0 ↦ '\0', ..., 32 ↦ ' ', 33 ↦ '!', 34 ↦ '"', ..., 126 ↦ 
'~', 127 ↦ '' }
you can definitely use the words subset and superset. Since this is just 
a set of tuples that may be contained idendically in other encodings 
(such as UTF-8), it is appropriate to say that ASCII is a subset of 
UTF-8. Of course, restricting this to the range of the function, i.e.
     ran ASCII = {'\0', ..., ' ', '!', ..., '~', '' }
(sorry, borrowing some syntax from Z) allows you to make repertoire 
comparisons in a sub/superset manner, making ran Latin9 a subset of ran 
Unicode, even though the respective functions don't share this relationship.
Just a thought :-)
Regards,
Johannes
This archive was generated by hypermail 2.1.5 : Thu Nov 11 2010 - 05:13:08 CST