Re: Deleting Lone Surrogates

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Mon, 5 Oct 2015 20:58:48 +0100

On Mon, 5 Oct 2015 16:51:25 +0200
Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:

> 2015-10-05 13:50 GMT+02:00 Martin J. Dürst <duerst_at_it.aoyama.ac.jp>:
>
> > In an editing tool (of which an editing interface is a part of), a
> > lone surrogate should just be removed! Apparently, that's what
> > happens in Richard's case, but only eventually.

> Not silently ! Even if this removal is required to go on editing,
> this must be notified to the user as it may occur in unedited parts
> of the file (and it may be the sign that the document is not fully
> plain text, so the user should not save the edited file)
> If this is caused by a quirk in the user input (defect of the input
> mode or keyboard layout), there should be a notification.

The lone surrogates (as I surmise) in this case are caused by the user
input being misinterpreted. The sequence of strings delivered to a
program running X receiving the same sequence of keystrokes is U+1148F,
U+114C0, U+0008, U+114BF, and I have no reason to doubt that the
offending program is receiving the same sequence. My working
hypothesis is that this is being simplified to U+1148F, U+D805,
U+114BF; the presence of U+D805 is a program error. I can reproduce
the problem in a previously empty file.

Now, on Windows, old MS keyboards at least deliver supplementary
characters in a pair of WM_CHAR messages. If one of these ligatures
were corrupted so that only the first of the messages was delivered, it
is not obvious to me how a program would readily detect the omission.
It would only become obvious when the start of the next *character* was
received.

Richard.
Received on Mon Oct 05 2015 - 14:59:59 CDT

This archive was generated by hypermail 2.2.0 : Mon Oct 05 2015 - 14:59:59 CDT