Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

From: Richard Wordingham via Unicode <>
Date: Wed, 31 May 2017 07:08:37 +0100

On Fri, 26 May 2017 21:41:49 +0000
Shawn Steele via Unicode <> wrote:

> I totally get the forward/backward scanning in sync without decoding
> reasoning for some implementations, however I do not think that the
> practices that benefit those should extend to other applications that
> are happy with a different practice.

> In either case, the bad characters are garbage, so neither approach
> is "better" - except that one or the other may be more conducive to
> the requirements of the particular API/application.

There's a potential issue with input methods that indirectly edit the
backing store. For example, GTK input methods (e.g. function
gtk_im_context_delete_surrounding()) can delete an amount of text
specified in characters, not storage units. (Deletion by storage
units is not available in this interface.) This might cause utter
confusion or worse if the backing store starts out corrupt. A corrupt
backing store is normally manually correctable if most of the text is

Received on Wed May 31 2017 - 01:08:55 CDT

This archive was generated by hypermail 2.2.0 : Wed May 31 2017 - 01:08:56 CDT