Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

From: Mark Davis ☕ (mark@macchiato.com)
Date: Wed Nov 10 2010 - 15:28:28 CST

  • Next message: Shawn Steele: "RE: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?"

    Mark

    *— Il meglio è l’inimico del bene —*

    On Wed, Nov 10, 2010 at 12:38, Asmus Freytag <asmusf@ix.netcom.com> wrote:

    > If you want to get that point across to a general audience, you could use a
    > more colloquial term, albeit one that itself derives from mathematics.
    >
    > Text that can be completely expressed in ASCII is fits into something
    > (ASCII) that works as a "lowest common denominator" of a large number of
    > character sets.
    >
    > You could call it "lowest common denominator" text.
    >
    > Since ASCII is the only set that exhibits such a lowest common denominator
    > relationship with enough other sets to make it interesting, and since that
    > relation is so well known, it's usually enough to just refer to it by name
    > (ASCII) without needing a general term - except perhaps for general
    > audiences that aren't very familiar with it.
    >

    That is actually not the case. There are superset relations among some of
    the CJK character sets, and also -- practically speaking -- between some of
    the windows and ISO-8859 sets. I say practically speaking because in general
    environments, the C1 controls are really unused, so where a non ISO-8859 set
    is same except for 80..9F you can treat it pragmatically as a superset.

    What are also tricky are the 'almost' supersets, where there are only a few
    different characters. Those definitely cause problems because the difference
    in data is almost undetectable.

    >
    > In this kinds of discussions I find it invariably useful to mention that
    > the copyright sign is not part of ASCII. (I suspect that it's the most
    > common character that makes a text lose its "lowest common denominator"
    > status).
    >
    > A./
    >
    >
    >
    >
    >
    >
    > On 11/10/2010 11:41 AM, Jim Monty wrote:
    >
    >> Here's a peculiar question.
    >>
    >> Is there a standard term to describe text that is in some subset CCS of
    >> another
    >> CCS but, strictly speaking, is only really in the subset CCS because it
    >> doesn't
    >> have any characters in it other than those represented in the smaller CCS?
    >>
    >> (The fact that I struggled to phrase this question in a way that made my
    >> meaning
    >> clear -- and failed -- is precisely my dilemma.)
    >>
    >> Text that has in it only characters that are in the
    >> ASCII character encoding is also in the ISO 8859-1 character encoding and
    >> the
    >> UTF-8 character encoding form of the Unicode coded character set, right? I
    >> often
    >> need to talk and write about text that has such multiple personalities,
    >> but I
    >> invariably struggle to make my point clearly and succinctly. I wind up
    >> describing the notion of it in awkwardly verbose detail.
    >>
    >> So I'm left wondering if the character encoding cognoscenti have a special
    >> utilitarian word for this, maybe one borrowed from mathematics (set
    >> theory).
    >>
    >> Jim Monty
    >>
    >>
    >>
    >>
    >>
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Nov 10 2010 - 15:31:26 CST