Re: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

From: Richard Wordingham <>
Date: Thu, 5 Jun 2014 18:40:09 +0100

On Thu, 5 Jun 2014 09:41:07 +0200
Philippe Verdy <> wrote:

> You'll probably want to sync on the first newline control and then
> proceed from that point. But now if you have those devices configured
> heterogenously and generating their own output encoding you won't
> necessarily know how it is encoded even uf all of them use some UTF of
> Unicode. So the stream will regularly repost an encoding mark, for
> exampel at the begining of each dated logged entry, and this could be
> just an encoded BOM (even with UTF-8, or some other UTF like UTF-16
> which would be more likely if the language contained essentially an
> East-Asian (CJK) language.

Of course, this is not an arbitrary fragment. In this location, ZWNBSP
will have almost no effect. (The only mechanisms I can think of are
character counts and the text being pasted immediately after another
word.) This, and the early belief that U+FFFE would not occur in
Unicode text, are why it was chosen.

