Encoding of Logos, Personal Gaiji et cetera for electronic library archiving (formerly Re: Hypersurrogates)

From: William_J_G Overington (wjgo_10009@btinternet.com)
Date: Sat Aug 29 2009 - 01:26:44 CDT

  • Next message: Asmus Freytag: "Re: Encoding of Logos, Personal Gaiji et cetera for electronic library archiving (formerly Re: Hypersurrogates)"

    Kenneth Whistler kindly answered my question and added some additional notes.

    It is now clear to me that hypersurrogates as I was thinking of them yesterday could not be encoded in a future version of Unicode.

    I have thought further on the problem and have thought that, even though use of codes above U+10FFFF is not possible, there is an alternative mechanism that could solve the encoding problem without breaking "interoperability between UTF-16, UTF-8, and UTF-32 forms of the encoding".

    Suppose that such items as logos and personal Gaiji could each be encoded as a sequence of two Unicode codepoints, one from plane 10 followed by one from plane 11 and that such a sequence would not imply any other codepoint, it would just be an ordered sequence of two codepoints, so that the character would be encoded at a point within a two-dimensional space. Rendering for display could be carried out using an advanced format font using glyph substitution technology. Indeed this would be easier than with glyph substitution for Latin ligatures where care needs to be taken to look for ffi before looking for ff as with the suggested encoding there would be no such need for order of looking.

    If an even larger codespace were required, then triples from planes 7, 8 and 9 could be used so that there were no introduction of a need for order of looking with any sequence starting with a codepoint from plane 10 as that sequence would be by definition a two codepoint set of one codepoint from plane 10 followed from one codepoint from plane 11.

    I feel that archiving of electronic documents is important and that a way for Unicode to handle all of the items that are needed in documents using a permanently valid unambiguous coding needs to be implemented.

    There may be various intellectual property issues and other legal issues to resolve. For example, encoding a logo using a sequence of a plane 10 codepoint followed by a plane 11 codepoint should not imply that anyone may use that logo for any purpose.

    William Overington

    29 August 2009

    --- On Friday 28 August 2009, Kenneth Whistler <kenw@sybase.com> wrote:

    > From: Kenneth Whistler <kenw@sybase.com>
    > Subject: Re: Hypersurrogates
    > To: wjgo_10009@btinternet.com
    > Cc: unicode@unicode.org, kenw@sybase.com
    > Date: Friday, 28 August, 2009, 7:05 PM
    > William Overington asked:
    > > I write to ask please as to whether anyone can please
    > state
    > > what is the present situation in relation to Unicode
    > and in
    > > relation to ISO as to whether there are or are not any
    > > permanent rulings as to which codes beyond U+10FFFF
    > could or
    > > could not be used in a future version of the Unicode
    > Standard?
    > There are such permanent rulings. Codes beyond U+10FFFF
    > will
    > not be used in future versions of the Unicode Standard.
    > And no one on the relevant committees (UTC and SC2/WG2)
    > has
    > any interest in pursuing a hypersurrogate scheme. All such
    > a scheme would accomplish is to break interoperability
    > between
    > UTF-16, UTF-8, and UTF-32 forms of the encoding.
    > --Ken

    This archive was generated by hypermail 2.1.5 : Sat Aug 29 2009 - 01:29:43 CDT