Re: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

From: Doug Ewell <>
Date: Wed, 04 Jun 2014 15:48:02 -0700

Richard Wordingham <richard dot wordingham at ntlworld dot com> wrote:

> The example that's usually given [of U+FEFF at the start of a stream]
> is that of a text file sliced into segments to avoid file size limits.
> In these cases, there is the risk that U+FEFF as ZWNBSP will wind up
> at the start of a segment and be stripped.

Nope, that's exactly the case I was excluding when I wrote:

> 3. U+FEFF [as a zero-width no-break space] at the beginning of a
> stream (note: not "packet" or arbitrary cutoff point)

If you are processing arbitrary fragments of a stream, without knowledge
of preceding fragments, as in this example, then you have no business
making *any* changes to that fragment based on interpretation of that
fragment as Unicode text. Your sole responsibilities at that point are
to pass the fragments, intact, from one process to the next, or to
disassemble and reassemble them.

Doug Ewell | Thornton, CO, USA | @DougEwell
Unicode mailing list
Received on Wed Jun 04 2014 - 17:49:34 CDT

This archive was generated by hypermail 2.2.0 : Wed Jun 04 2014 - 17:49:35 CDT