Re: UTF-8N?

From: Peter_Constable@sil.org
Date: Thu Jun 22 2000 - 09:29:26 EDT


On 06/22/2000 02:24:49 AM <Antoine.Leca@renault.fr> wrote:

>It was my understanding that U+FEFF when received as first character
should be
>seen as BOM and not as a character, and handled accordingly.

When the encoding scheme is known to be UTF-16BE or UTF-16LE, it *must not*
be interpreted as a BOM. When the encoding scheme is known to be UTF-16
(i.e. byte order is unknown), then it *must* be interpreted as a BOM. But
in the case of UTF-8, there is no requirement either way, and so it is
ambiguous: you don't know if it's supposed to be a BOM or ZWNBSP (unlikely
as an initial character, but valid).

Peter Constable



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT