Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

From: Bjoern Hoehrmann (
Date: Wed Nov 10 2010 - 23:11:30 CST

  • Next message: Khaled Hosny: "Re: Combining Triple Diacritics (N3915) not accepted by UTC #125"

    * Jim Monty wrote:
    >Is there a standard term to describe text that is in some subset CCS of another
    >CCS but, strictly speaking, is only really in the subset CCS because it doesn't
    >have any characters in it other than those represented in the smaller CCS?
    >(The fact that I struggled to phrase this question in a way that made my meaning
    >clear -- and failed -- is precisely my dilemma.)
    >Text that has in it only characters that are in the
    >ASCII character encoding is also in the ISO 8859-1 character encoding and the
    >UTF-8 character encoding form of the Unicode coded character set, right? I often
    >need to talk and write about text that has such multiple personalities, but I
    >invariably struggle to make my point clearly and succinctly. I wind up
    >describing the notion of it in awkwardly verbose detail.

    You are asking for a term to say something unambiguously ("just this"),
    but then tell us that you wish to talk about ambiguity ("multiple"). If
    you want to talk about "just this" then there is no specific instance of
    "text", so the problem "this is X but it could also be Y or Z" does not
    arise. If you want to talk about "multiple" then you lack a frame of re-
    ference and all the "multiple" are equivalent.

    Fundamentally, I do not think it makes sense to say that some text is in
    some encoding. Text is text, you wouldn't pick up a dead-tree kind of
    book and say "Oh, this is UTF-8 and US-ASCII and ISO-8859-1 encoded" be-
    cause it uses only letters found in the ASCII repertoire.

    If you have a container that contains only bit strings that are UTF-8
    encoded sequences of Unicode scalar values, then do not talk about any
    specific thing that could go in that container.

    If you have a specific sequence of Unicode scalar values and a string of
    bits, and want to point out that for that specific bit string many en-
    codings map the string to the same sequence of Unicode scalar values,
    then I do not see why you would need a specific term.

    Perhaps <> is relevant

    Björn Höhrmann · ·
    Am Badedeich 7 · Telefon: +49(0)160/4415681 ·
    25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · 

    This archive was generated by hypermail 2.1.5 : Wed Nov 10 2010 - 23:16:00 CST