From: verdy_p (verdy_p@wanadoo.fr)
Date: Thu Jun 04 2009 - 21:22:42 CDT
> Message du 04/06/09 17:41
> De : "Asmus Freytag"
> A : "William_J_G Overington"
> Copie à : "Kenneth Whistler" , unicode@unicode.org, "Ruszlán Gaszanov" , "Hans
Aberg" , "Doug Ewell"
> Objet : Re: Invalid code points
>
>
> On 6/4/2009 2:13 AM, William_J_G Overington wrote:
> >
> > Well, no, because the suggestion of using U+FFFC does have a clue for humans as to what might be meant.
> >
> >
> >
> Not really. U+FFFC is supposed to stand on its own in a *text* stream,
> with all the data being in *another* stream.
>
> Your parameter passing example has a single text stream.
>
> Despite what you might want to think, U+FFFC is not the "start of object
> data marker" that you would like it to be.
>
> A./
I also agree that the only useful interest that I see for U+FFFC is as a placeholder when it is needed for
indicating the position where an external binary object is to be inserted (a single character may be needed in the
case where there are several objects to insert at the same position, and if that insertion position is not encoded
in the upper-layer enveloppe format containing the plain text stream and the object streams).
But if your plain text stream is effectively already containing an upper-layer protocol allowing the encapsulation
of multiple streams, like XML or HTML, based on a schema describing the various streams encapsulated, or in a format
where the insertion position within the text stream is not even needed because the streams are independant like in
MIME format for file attachments in emails, U+FFFC is not needed at all and should not even be used, as the upper-
layer protocol will be far better and more descriptive, and will allow a reacher palette of inclusion/reference
options, such as possibly allowing multiple references the same object multiple times from distinct positions in the
text stream (e.g. with links to anchored objects in a DOM structure).
For this reason, I've never seen any use of U+FFFC anywhere (and I really wonder what is the upper-layer protocol
for which it was needed and then encoded...) for multistreams documents.
So it just remains only one use of it: just as a visible indicator that a plain-text only document (without any
upper-layer encapsulation protocol for multiple streams) that some object in a riche document could not be fully
converted (for example when converting an HTML page containing graphics in the middle of the text (when graphics are
used in the HTML document instead of a character which is still not encoded distinctly in the UCS, e.g. characters
shown within a character encoding proposal or in its documentation or justification).
In that last case the U+FFFC could be used to replace any part of the document that is currently not convertible to
plain text, i.e. like a character substitute in a way similar to the ASCII "SUB" control, or to the U+FFFD
substitute that is often used as a character substitute (in character mapping tables) when converting plain texts
between a UCS-compatible encoding and some legacy or abscure proprietary 8-bit or multibyte encoding where
conversion errors/exceptions are not acceptable (in all cases, the encoding conversion will not have roundtrip
compatibility
This archive was generated by hypermail 2.1.5 : Thu Jun 04 2009 - 21:58:25 CDT