L2/09-051 Subject: Regex in UAX 29 From: Mark Davis Date: Tue, 27 Jan 2009 ===== We've tested and verified that the following regex pattern matches the grapheme cluster definition in UAX 29 Unicode Text Segmentation. It would be useful to add it to UAX 29. ( ( CR LF ) | ( Prepend* ( L+ | (L* ( ( V | LV ) V* | LVT ) T*) | T+ | [^ Control CR LF ] ) ( Extend | SpacingMark )* ) | . ) Moreover, where there are straightforward regex patterns corresponding to other segmentation boundaries, they should be added as well. At this point, we anticipate that to be word boundaries (but not linebreak or sentence break). Mark