Re: A few questions about encoding discovery, copying text, and pasting text in one encoding into text in another encoding

From: Philippe Verdy <>
Date: Wed, 19 Dec 2012 21:12:42 +0100

Let's imagine a legacy editor working only in legacy encoding A. Suppose
that the user copies some text in it to place it in the clipboard. Then the
application will use the clipboard API compatible with this encoding.

The system clipboard will provide automatically the support to convert this
legacy encoding A to Unicode if necessary.

Now the user pastes in application U working in Unicode : that application
uses the Clipboard API working in Unicode. As the text format in the
clipboard is not in Unicode, the system will convert this format
implicitly, so it will just say to the application U that the clipboard
containts text already encoded in Unicode. The application U will pick that
format, and the system will convert the data sent by application A from its
legacy encoding A to Unicode before returning it to application U. The
application U itself had nothing to do.

Now the user pastes the same text in application B working in another
legacy encoding B. The system clipboard could automatically convert from A
to Unicode but could fail to convert it once again from Unicode to B. For
this reason the Clipboard would expose to application B a prefered
clipboard text format in Unicode, and another text format in encoding B. It
will be to the application B to determine ig it can handle the prefered
(advenced) format U and convert it itself, or drop that propose format and
pick the basic format based on B (i.e. the native encoding used by that
legacy application B). If application B cannot support Unicode at all, it
will just query the Clipboard adking it to restrict the formats to only
those that application B undestands, so the format U will not even be
exposed and pickable by application B. The same situation will occur when
the user copies from application U to application A.

In general a clipboard contains one or more formats that the application A
can generate, but the set of formats available in the clipboaard and
exposed to other application for pasting from it is a bit larger as it
will implicitly perform some conversions (e.g. from legacy encoding to
Unicode, and/or from some system-standard rich-text format to plain-text,
or conversion from a known graphic format to a common one for interchange,
using installed codecs) so that the pasting application will have more
choices of formats to pick from and with which they can work.

The list of formats exposed to pasting applications that are querying the
clipboard is generally ordered (from the prefered one, that keeps the
source format, to the next format which is more interoperable, to a few
others alsos supported by the codecs installed on the system). In most
cases, the pasting application will pick the 1st format in this list that
it can understand.

Once the pasting application has picked the format, it will query the data
in that format, and the system clipboard will use the system-provided
codec, to querty and convert the data that will be sent by the application
that "sent" this data to the clipboard.

In fact in many cases, the data is still not converted when the user
"copies" data into the clipboard, what the coying application is doing is
just creating an internal copy, and then exposing to the clipboard a
dataset saying that data available in that format and that the appplication
can be queried to retrieve it in that format. So no conversion will occur
before the user pastes from the clipboard (in the same application or in
another one). The source application can also expose that it will convert
the data itself by exposing another format that it will generate in the fly
from the data that the application has kept in its own internal buffer.

Now suppose the user quits the source application : the application will
instruct the clipboard to keep the copy of the dabta somewhere (in a
temporay file or shared memory block), and it is the clipboard itself that
will query the application to provide the data for each of the format that
were exposed by the application, and when this is done, the source
application can quit. Some variants include a background thread that will
start querying the application for the data kept in the sourec application,
so that the source application can free this data sooner. Another variant
consists in the source application specifying to the clipboard that the
copied data is not just available in that format, but is already available
in a memory block that will be placed imemdiately in shared memory (with
the clipboard keeping a shared lock on that memory block, so that the
source application may immediately free this block from its own managed

Various schemes are possible to handle those transactions between the tree
parties : a copying application, the system clipboard, and a pasting
application. But this is generally enough flexible to avoid unnecessary
data conversions, but also to maximize the reusability with the system
clipboard being able to use any other available system codecs only when it
will be necessary.

And the system clipboard will expose several APIs allowing various
applications to work with one or the other, each API having its own set of
converters implicitly used by the system clipboard with system codecs (some
of these codecs are lossless, like ISO 8859-* or Windows-* plain-text to
Unicode plain-text, some other codecs are lossy, but they are still added
implicitly as available at end of the lists of formats and not preferred in
only one of these APIs but not necessarly in another one that was not
designed to be compatible witn that format).

If the source data is not convertible by the clipboard itself, or its
conversion cannot be made compatible with the API used by the pasting
application, the clipboard will appear empty to this last application,
though it will not seem empty for other compatible application (such as a
"Clipboard Viewer" application, or an "Extended Clipboard Viewer"
application that also keeps an history of clipboard data in all formats,
and from which the user can select the content to expose to pasting
Received on Wed Dec 19 2012 - 14:15:44 CST

This archive was generated by hypermail 2.2.0 : Wed Dec 19 2012 - 14:15:45 CST