Re: UAX44: loose matching of symbolic values and the `is` prefix

From: Mathias Bynens <mathias_at_qiwi.be>
Date: Mon, 6 Jun 2016 18:25:12 +0300

> On 6 Jun 2016, at 18:04, Ken Whistler <kenwhistler_at_att.net> wrote:
>
> UAX #44 doesn't *require* any regex engine to include this "is prefix" handling.

Are you referring to the fact that the first paragraph on http://unicode.org/reports/tr44/#Matching_Rules uses “strongly recommended” and “should” instead of “required” and “must”?

> What UAX #44 does is recommend that all property and property value aliases be correctly recognized, and then specifies a clear statement (in UAX44-LM3) of the loose matching rule for recognizing the various forms of those aliases that could be considered equivalent. I don't think messing with that rule statement (which has been in place since 2010) would be helpful.

Why not? What I had in mind was adding a small sentence like:

> For compatibility reasons, implementations may optionally support any initial prefix string "is".

This wouldn’t be a breaking change in any way, and it would enable new implementations that aim to follow UAX44 to do so without having to support `is`, and it would solve the problem everywhere the matching rules get applied rather than just for regular expressions.

> I think the target of concern here is wrong.

Not sure I agree. It seems to me the `is` prefix is problematic (for the same reasons) wherever it’s used, whether that’s in regular expressions or not.

> The target instead should be in UTS #18, which happily, has a proposed update available for comment right now:
>
> http://www.unicode.org/review/pri325/
>
> The relevant point is:
>
> http://www.unicode.org/reports/tr18/tr18-18.html#RL1.2
>
> That is the conformance part that requires that conformant Unicode regex implementations "must follow the Matching rules from [UAX44]".

Thanks for the pointer! I will submit my feedback there as well. It seems more awkward / difficult to add an exception there rather than just slightly tweaking the UAX44-LM3 text as suggested above, though.
Received on Mon Jun 06 2016 - 10:25:47 CDT

This archive was generated by hypermail 2.2.0 : Mon Jun 06 2016 - 10:25:47 CDT