Re: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Thu, 5 Jun 2014 18:40:09 +0100

On Thu, 5 Jun 2014 09:41:07 +0200
Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:

> You'll probably want to sync on the first newline control and then
> proceed from that point. But now if you have those devices configured
> heterogenously and generating their own output encoding you won't
> necessarily know how it is encoded even uf all of them use some UTF of
> Unicode. So the stream will regularly repost an encoding mark, for
> exampel at the begining of each dated logged entry, and this could be
> just an encoded BOM (even with UTF-8, or some other UTF like UTF-16
> which would be more likely if the language contained essentially an
> East-Asian (CJK) language.

Of course, this is not an arbitrary fragment. In this location, ZWNBSP
will have almost no effect. (The only mechanisms I can think of are
character counts and the text being pasted immediately after another
word.) This, and the early belief that U+FFFE would not occur in
Unicode text, are why it was chosen.

Richard.
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Thu Jun 05 2014 - 12:41:09 CDT

This archive was generated by hypermail 2.2.0 : Thu Jun 05 2014 - 12:41:10 CDT