Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

From: Johannes Rössel (joey@muhkuhsaft.de)
Date: Thu Nov 11 2010 - 05:08:24 CST

Next message: Johannes Rössel: "Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?"

Previous message: Khaled Hosny: "Re: Combining Triple Diacritics (N3915) not accepted by UTC #125"
In reply to: Martin J. Dürst: "Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?"
Next in thread: Johannes Rössel: "Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?"
Reply: Johannes Rössel: "Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hello Martin,

On 2010-11-11 04:54, "Martin J. Dürst" wrote:
> Yes, except that the terms superset/subset (and set in general)
> shouldn't be used unless you really strictly speak about the repertoire
> of characters, and not the encoding itself. So e.g. the repertoire of
> iso-8859-1 is a subset of the repertoire of UTF-8. However, iso-8859-1
> is not a subset of UTF-8, not because you can't label some text encoded
> as iso-8859-1, but because subset relationships among the encodings
> themselves don't make sense).

if you model encodings as functions, thereby making ASCII something like

ASCII ≔ { 0 ↦ '\0', ..., 32 ↦ ' ', 33 ↦ '!', 34 ↦ '"', ..., 126 ↦
'~', 127 ↦ '' }

you can definitely use the words subset and superset. Since this is just
a set of tuples that may be contained idendically in other encodings
(such as UTF-8), it is appropriate to say that ASCII is a subset of
UTF-8. Of course, restricting this to the range of the function, i.e.

ran ASCII = {'\0', ..., ' ', '!', ..., '~', '' }

(sorry, borrowing some syntax from Z) allows you to make repertoire
comparisons in a sub/superset manner, making ran Latin9 a subset of ran
Unicode, even though the respective functions don't share this relationship.

Just a thought :-)

Regards,
Johannes

Next message: Johannes Rössel: "Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?"
Previous message: Khaled Hosny: "Re: Combining Triple Diacritics (N3915) not accepted by UTC #125"
In reply to: Martin J. Dürst: "Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?"
Next in thread: Johannes Rössel: "Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?"
Reply: Johannes Rössel: "Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Nov 11 2010 - 05:13:08 CST