Re: Invalid code points

From: William_J_G Overington (wjgo_10009@btinternet.com)
Date: Thu Jun 04 2009 - 04:13:49 CDT

Next message: Arne Goetje: "Re: Fonts across platforms...."

Previous message: Hans Aberg: "Re: Invalid code points"
Maybe in reply to: Hans Aberg: "Re: Invalid code points"
Next in thread: Asmus Freytag: "Re: Invalid code points"
Reply: Asmus Freytag: "Re: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Wednesday 3 June 2009, Kenneth Whistler <kenw@sybase.com> wrote:

> William Overington suggested:
>
> > The suggestion of using b64-encoded binary data could
> perhaps
> > be adapted by placing a Unicode U+FFFC OBJECT
> REPLACEMENT CHARACTER
> > in front of the b64-encoded binary data. That
> way, the parameter
> > passing would always be in Unicode characters and the
> presence of
> > a U+FFFC character would indicate that subsequent
> characters in
> > the parameter should be interpreted as b64-encoded
> binary data.
>
> It may perhaps be belaboring the obvious, but U+FFFC
> OBJECT
> REPLACEMENT CHARACTER is not defined that way, and would
> not
> indicate that (or anything else) about subsequent
> characters
> in a string parameter.

Ken is correct.

>
> Any attempt to use U+FFFC in that way would be very
> unlikely to
> be interpreted as such by any Unicode-conformant system,
> and
> in fact is nothing more than an arbitrary attempt to
> establish
> a text convention which would consist of a higher-level
> protocol.

Well, not quite arbitrary. The problem is to develop a demonstration of a new idea of passing objects using a text parameter. Ruszlán Gaszanov asked "What's wrong with passing b64-encoded binary data?" and I suggested that "Passing b64-encoded binary data could be ambiguous as to whether it was text or b64-encoded binary data." and suggested a way that either text or b64-encoded binary data could be passed as a parameter.

The Unicode Standard includes the following document.

http://www.unicode.org/versions/Unicode5.0.0/ch16.pdf

The document has the following on page 26.

quote

U+FFFC. The U+FFFC object replacement character is used as an insertion point for objects located within a stream of text. All other information about the object is kept outside the character data stream. Internally it is a dummy character that acts as an anchor point for the object’s formatting information. In addition to assuring correct placement of an object in a data stream, the object replacement character allows the use of general stream-based algorithms for any textual aspects of embedded objects.

end quote

So, my suggestion needs to be altered so that the parameter passing mechanism, upon detecting a U+FFFC character, places all subsequent characters from after the U+FFFC character into a separate storage place. The passed parameter is thus then true Unicode that may, but need not, contain a U+FFFC character.

> One could equally well (and probably with equal outcome)
> assert
> that a U+25E7 SQUARE WITH LEFT HALF BLACK character would
> indicate
> that subsequent characters in a parameter should be
> interpreted
> as b64-encoded binary data.

Well, no, because the suggestion of using U+FFFC does have a clue for humans as to what might be meant.

> Or for that matter, that
> subsequent
> characters in a string should be interpreted as a chocolate
> chip
> cookie recipe.
>
> --Ken
>
>
>

Well, U+003C LESS-THAN SIGN gets used for many purposes in some documents.

William Overington

4 June 2009

Next message: Arne Goetje: "Re: Fonts across platforms...."
Previous message: Hans Aberg: "Re: Invalid code points"
Maybe in reply to: Hans Aberg: "Re: Invalid code points"
Next in thread: Asmus Freytag: "Re: Invalid code points"
Reply: Asmus Freytag: "Re: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jun 04 2009 - 04:16:12 CDT