Ah! That explains why
pcre2grep -u '^\X{1}$'
matches with
🇬🇧
🇩🇪🇫🇷
🇨🇳🇮🇹🇲🇾
🇪🇸🇦🇺🇷🇺🇳🇱🇯🇵
...etc...
André Schappo
On 17 Dec 2017, at 17:17, Mark Davis ☕️ via Unicode <unicode_at_unicode.org<mailto:unicode_at_unicode.org>> wrote:
Thanks for the feedback. You're correct about this; that is a holdover from an earlier version of the document when there was a more basic treatment of RI sequences.
There is already an action to modify these. There is a placeholder review note about that just above
http://www.unicode.org/reports/tr29/proposed.html#Table_Combining_Char_Sequences_and_Grapheme_Clusters
(scroll up just a bit).
Mark
Mark<https://twitter.com/mark_e_davis>
On Sun, Dec 17, 2017 at 4:16 PM, David P. Kendal via Unicode <unicode_at_unicode.org<mailto:unicode_at_unicode.org>> wrote:
Hi,
It’s possible I’m missing something, but the formal grammar/regular
expression given for extended grapheme clusters appears to have a bug
in it.
<https://unicode.org/reports/tr29/#Table_Combining_Char_Sequences_and_Grapheme_Clusters>
The bug is here:
RI-Sequence := Regional_Indicator+
If the formal grammar is intended to exactly match the rules given the
the “Grapheme Cluster Boundary Rules” section below it as-is, then
this should be
RI-Sequence := Regional_Indicator Regional_Indicator
since as given it would cause any number of RI characters to coalesce
into a single grapheme cluster, instead of pairs of characters. That
is, the text U+1F1EC U+1F1E7 U+1F1EA U+1F1FA would represent one
grapheme cluster instead of the correct two.
--
dpk (David P. Kendal) · Nassauische Str. 36, 10717 DE · http://dpk.io/
we do these things not because they are easy, +49 159 03847809<tel:%2B49%20159%2003847809>
but because we thought they were going to be easy
— ‘The Programmers’ Credo’, Maciej Cegłowski
🌏 🌍 🌎
André Schappo
https://schappo.blogspot.co.uk
https://twitter.com/andreschappo
https://weibo.com/andreschappo
https://groups.google.com/forum/#!forum/computer-science-curriculum-internationalization
Received on Mon Dec 18 2017 - 03:59:50 CST
This archive was generated by hypermail 2.2.0 : Mon Dec 18 2017 - 03:59:51 CST