Re: Invalid code points

From: verdy_p (verdy_p@wanadoo.fr)
Date: Thu Jun 04 2009 - 21:22:42 CDT

  • Next message: Asmus Freytag: "Re: Invalid code points"

    > Message du 04/06/09 17:41
    > De : "Asmus Freytag"
    > A : "William_J_G Overington"
    > Copie à : "Kenneth Whistler" , unicode@unicode.org, "Ruszlán Gaszanov" , "Hans
    Aberg" , "Doug Ewell"
    > Objet : Re: Invalid code points
    >
    >
    > On 6/4/2009 2:13 AM, William_J_G Overington wrote:
    > >
    > > Well, no, because the suggestion of using U+FFFC does have a clue for humans as to what might be meant.
    > >
    > >
    > >
    > Not really. U+FFFC is supposed to stand on its own in a *text* stream,
    > with all the data being in *another* stream.
    >
    > Your parameter passing example has a single text stream.
    >
    > Despite what you might want to think, U+FFFC is not the "start of object
    > data marker" that you would like it to be.
    >
    > A./

    I also agree that the only useful interest that I see for U+FFFC is as a placeholder when it is needed for
    indicating the position where an external binary object is to be inserted (a single character may be needed in the
    case where there are several objects to insert at the same position, and if that insertion position is not encoded
    in the upper-layer enveloppe format containing the plain text stream and the object streams).

    But if your plain text stream is effectively already containing an upper-layer protocol allowing the encapsulation
    of multiple streams, like XML or HTML, based on a schema describing the various streams encapsulated, or in a format
    where the insertion position within the text stream is not even needed because the streams are independant like in
    MIME format for file attachments in emails, U+FFFC is not needed at all and should not even be used, as the upper-
    layer protocol will be far better and more descriptive, and will allow a reacher palette of inclusion/reference
    options, such as possibly allowing multiple references the same object multiple times from distinct positions in the
    text stream (e.g. with links to anchored objects in a DOM structure).

    For this reason, I've never seen any use of U+FFFC anywhere (and I really wonder what is the upper-layer protocol
    for which it was needed and then encoded...) for multistreams documents.

    So it just remains only one use of it: just as a visible indicator that a plain-text only document (without any
    upper-layer encapsulation protocol for multiple streams) that some object in a riche document could not be fully
    converted (for example when converting an HTML page containing graphics in the middle of the text (when graphics are
    used in the HTML document instead of a character which is still not encoded distinctly in the UCS, e.g. characters
    shown within a character encoding proposal or in its documentation or justification).

    In that last case the U+FFFC could be used to replace any part of the document that is currently not convertible to
    plain text, i.e. like a character substitute in a way similar to the ASCII "SUB" control, or to the U+FFFD
    substitute that is often used as a character substitute (in character mapping tables) when converting plain texts
    between a UCS-compatible encoding and some legacy or abscure proprietary 8-bit or multibyte encoding where
    conversion errors/exceptions are not acceptable (in all cases, the encoding conversion will not have roundtrip
    compatibility



    This archive was generated by hypermail 2.1.5 : Thu Jun 04 2009 - 21:58:25 CDT