Re: Unicode Regular Expressions, Surrogate Points and UTF-8

From: Markus Scherer <markus.icu_at_gmail.com>
Date: Sat, 31 May 2014 19:28:27 -0700

On Sat, May 31, 2014 at 1:59 AM, Richard Wordingham <
richard.wordingham_at_ntlworld.com> wrote:

> Bear in mind that a pattern \uD808 shall not match anything in a
> well-formed Unicode string.

Depends. See the definitions of Unicode strings vs. UTF strings.

\uD808\uDF45 specifies a sequence of two
> codepoints.

Implementations that use Unicode 16-bit strings will usually treat this as
one supplementary code point.
In Java, there is no other way to escape one.

markus

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Sat May 31 2014 - 21:29:29 CDT

This archive was generated by hypermail 2.2.0 : Sat May 31 2014 - 21:29:29 CDT