Re: Private Use Surrogate Pairs

From: Doug Ewell (dewell@adelphia.net)
Date: Thu May 09 2002 - 00:59:19 EDT


Peter_Constable at sil dot org wrote:

> I think Jim is asking for clarification in the text of the Standard
> and not just in a response to him, but in case anyone isn't sure,
> the four that are excluded are U+FFFFE, U+FFFFF, U+10FFFE and
> U+10FFFF.
>
> And don't bother asking for a good reason *why* they are excluded:
> there isn't any good reason why; they just are.

I know it's popular to say there's no good reason for these to be
excluded, but at least excluding the U+xxFFFE code points helps prevent
UTF-32LE from being detected as big-endian UTF-16 with BOM:

    Big-endian UTF-16: FE FF .. ..
    U+xxFFFE in UTF-32LE: FE FF xx 00

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Thu May 09 2002 - 01:51:20 EDT