Re: Question about Perl5 extended UTF-8 design

From: Markus Scherer <markus.icu_at_gmail.com>
Date: Thu, 5 Nov 2015 10:15:28 -0800

On Thu, Nov 5, 2015 at 9:25 AM, Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:

> (0xFF was reserved only in the old RFC version of UTF-8 when it allowed
> code points up to 31 bits, but even this RFC is obsolete and should no
> longer be used and it has never been approved by Unicode).
>

No, even in the original UTF-8 definition, "The octet values FE and FF
never appear." https://tools.ietf.org/html/rfc2279
The highest lead byte was 0xFD.

(For the "really original" version see
http://www.unicode.org/L2/Historical/wg20-n193-fss-utf.pdf)

In the current definition, "The octet values C0, C1, F5 to FF never
appear." https://tools.ietf.org/html/rfc3629 =
https://tools.ietf.org/html/std63

markus
Received on Thu Nov 05 2015 - 12:16:36 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 05 2015 - 12:16:36 CST