Re: Unicode Regular Expressions, Surrogate Points and UTF-8 from Markus Scherer on 2014-05-30 (Unicode Mail List Archive)

From: Markus Scherer <markus.icu_at_gmail.com>
Date: Fri, 30 May 2014 16:15:12 -0700

If you use Unicode 16-bit strings, it's easy to "pass through" unpaired
surrogates and treat them like code points; it's often not productive or
necessary to check for them all the time, that is, to be strict about
UTF-16.

On the other hand, I don't think anyone expects you to support invalid
UTF-8, and especially not to support any and all Unicode 8-bit strings (see
Unicode 3.9 Unicode Encoding Forms for what I mean here).

If you find UTS #18 unclear or misleading, I suggest you submit feedback
pointing out specific text issues.

markus

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Fri May 30 2014 - 18:16:31 CDT

This archive was generated by hypermail 2.2.0 : Fri May 30 2014 - 18:16:31 CDT