RE: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

From: Shawn Steele (Shawn.Steele@microsoft.com)
Date: Wed Nov 10 2010 - 15:53:02 CST

  • Next message: Markus Scherer: "Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?"

    Or did you mean "this is UTF-8 even though in only has characters that also look like ASCII?" I was a bit confused :)

    If you are communicating this information, then that's probably also a good time to also communicate "Use Unicode, like UTF-8, and you won't have this kind of problem!"

    -Shawn

    -----Original Message-----
    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of Asmus Freytag
    Sent: Wednesday, November 10, 2010 12:39 PM
    To: Jim Monty
    Cc: unicode@unicode.org
    Subject: Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

    If you want to get that point across to a general audience, you could use a more colloquial term, albeit one that itself derives from mathematics.

    Text that can be completely expressed in ASCII is fits into something
    (ASCII) that works as a "lowest common denominator" of a large number of character sets.

    You could call it "lowest common denominator" text.

    Since ASCII is the only set that exhibits such a lowest common denominator relationship with enough other sets to make it interesting, and since that relation is so well known, it's usually enough to just refer to it by name (ASCII) without needing a general term - except perhaps for general audiences that aren't very familiar with it.

    In this kinds of discussions I find it invariably useful to mention that the copyright sign is not part of ASCII. (I suspect that it's the most common character that makes a text lose its "lowest common denominator"
    status).

    A./

    On 11/10/2010 11:41 AM, Jim Monty wrote:
    > Here's a peculiar question.
    >
    > Is there a standard term to describe text that is in some subset CCS
    > of another CCS but, strictly speaking, is only really in the subset
    > CCS because it doesn't have any characters in it other than those represented in the smaller CCS?
    >
    > (The fact that I struggled to phrase this question in a way that made
    > my meaning clear -- and failed -- is precisely my dilemma.)
    >
    > Text that has in it only characters that are in the ASCII character
    > encoding is also in the ISO 8859-1 character encoding and the
    > UTF-8 character encoding form of the Unicode coded character set,
    > right? I often need to talk and write about text that has such
    > multiple personalities, but I invariably struggle to make my point
    > clearly and succinctly. I wind up describing the notion of it in awkwardly verbose detail.
    >
    > So I'm left wondering if the character encoding cognoscenti have a
    > special utilitarian word for this, maybe one borrowed from mathematics (set theory).
    >
    > Jim Monty
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Nov 10 2010 - 15:55:42 CST