There are 102 characters that map to a sequence when casefolded. The question for these is whether a caseless regex match can and should match them.
I would rephrase that. The question is "Under what circumstances should a regex match them?" or "Which regex should match them?".
As I remarked earlier, I think it would take some work to put together a solid proposal on how to handle these in regex expressions, so that, say any expression that matched:
* OFFICE // 6 chars
would also match
* oﬃce // 4 chars, including "ffi" ligature
and vice versa.
And I would reply that this is an entirely different proposal.
There are two different levels of equivalency here.
OFFICE and office (same number of character) are case
office and oﬃce (6 and 4 characters) are ligature
Just because ligatures do not have standard case pairs (there's no FFI ligature) should not mean that caseless matching also becomes ligature-blind
In other words, if a regex, such as /office/ doesn't match 'oﬃce', with the ligature, then making the search caseless, should not necessarily include it in the match.
Note, I'm not saying you shouldn't be able to easily express a search mode that ignores ligatures, but it should not by default be caseless matching.
For user-friendly Unicode regex you may need a mode that ignores several different aspects of how a character can be represented in Unicode all at once - this gets back to the discussion of selective foldings.