Re: Encoding of Logos, Personal Gaiji et cetera for electronic library archiving (formerly Re: Hypersurrogates)

From: Asmus Freytag (
Date: Sat Aug 29 2009 - 04:49:12 CDT

  • Next message: William_J_G Overington: "Re: Encoding of Logos, Personal Gaiji et cetera for electronic library archiving (formerly Re: Hypersurrogates)"

    On 8/28/2009 11:26 PM, William_J_G Overington wrote:
    > It is now clear to me that hypersurrogates as I was thinking of them yesterday could not be encoded in a future version of Unicode.
    > I have thought further on the problem and have thought that, even though use of codes above U+10FFFF is not possible, there is an alternative mechanism that could solve the encoding problem without breaking "interoperability between UTF-16, UTF-8, and UTF-32 forms of the encoding".
    > Suppose that such items as logos and personal Gaiji could each be encoded as a sequence of two Unicode codepoints, one from plane 10 followed by one from plane 11 and that such a sequence would not imply any other codepoint, it would just be an ordered sequence of two codepoints, so that the character would be encoded at a point within a two-dimensional space.
    In principle, you can reference (let's leave the word "encode" out of
    this for a while) anything by labeling it with a string. HTML uses
    entities, such as " where the string is delimited by two characters
    "&" and ";", many bulletin board and email implmentations will interpret
    strings like :) as glyphs (you many not see the ":" followed by ")"
    here. These strings have no standard escape characters. Some support
    more regular string formats as well, as in ":shocked:". So that idea is
    not new.

    Common to all of these approaches is that they are *higher* level
    protocols. There's always a more basic level where "&" is just the
    character "&", and only when you claim to support HTML do you have to
    turn the real "&" into "&" to distinguish it from the syntax character.

    Howver, if you start reserving code points on certain planes for such a
    scheme and attempt to enlist the Unicode Consortium (owners of an
    encoding standard) into this process, it's impossible for people *not*
    to perceive this as *encoding*. That is the reason why these kinds of
    proposals will forever be non-starters in *this* particular context.
    > If an even larger codespace were required,
    That's the other reason: after 20 years of effort, the Consortium has
    barely managed to encode as many characters as there are "private use"
    code points in the standard. Your worries about code space extensions
    are premature. Really.


    This archive was generated by hypermail 2.1.5 : Sat Aug 29 2009 - 04:52:30 CDT