Re: Corner cases (was: Re: UTF-16 Encoding Scheme and U+FFFE)

From: Asmus Freytag <>
Date: Wed, 04 Jun 2014 11:40:11 -0700

On 6/4/2014 11:26 AM, Doug Ewell wrote:
> Sorry, I left out an important detail.
> I wrote:
>> 3. U+FEFF at the beginning of a stream (note: not "packet" or
>> arbitrary cutoff point)
> I meant U+FEFF as a zero-width no-break space. Obviously it is very
> common to see U+FEFF as a signature or BOM.
> My underlying question here is, how common is it that the producer of a
> stream actually intends this character *at the start of a stream* to be
> a ZWNBSP, not to be stripped lest the actual text content be altered?

The semantics of it were chosen at the time to make no sense at the
start, and to make the character invisible in most situations. The
remnant of its semantic was later taken up by Word Joiner, so that there
is now NO use for this as part of text.

The use as part of a convention has always been clear. If you stick this
at the front, readers will byte-reverse your data; that should weed out
accidental use pretty quickly :) Or prevent people from getting "cute"
with it in other ways.

So, I would think that for this particular code point, you can safely
assume that it's buggy or test data.

Buggy data you just byte reverse as requested and let the user take the
consequence. :)

> --
> Doug Ewell | Thornton, CO, USA
> | @DougEwell
> _______________________________________________
> Unicode mailing list

Unicode mailing list
Received on Wed Jun 04 2014 - 13:40:45 CDT

This archive was generated by hypermail 2.2.0 : Wed Jun 04 2014 - 13:40:45 CDT