Re: Unicode Regular Expressions, Surrogate Points and UTF-8

From: Markus Scherer <>
Date: Sat, 31 May 2014 19:28:27 -0700

On Sat, May 31, 2014 at 1:59 AM, Richard Wordingham <> wrote:

> Bear in mind that a pattern \uD808 shall not match anything in a
> well-formed Unicode string.

Depends. See the definitions of Unicode strings vs. UTF strings.

\uD808\uDF45 specifies a sequence of two
> codepoints.

Implementations that use Unicode 16-bit strings will usually treat this as
one supplementary code point.
In Java, there is no other way to escape one.


Unicode mailing list
Received on Sat May 31 2014 - 21:29:29 CDT

This archive was generated by hypermail 2.2.0 : Sat May 31 2014 - 21:29:29 CDT