-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Swift: Regular expressions library. #13470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…n't obscure what's going on).
relevant variants, remove some duplicates, add the testing script also.
Try to check out the
See
Concrete (and small) example?
Hard to know 🤷 |
|
Thanks for the feedback and for sharing your experiences. I have several areas to investigate now... |
I minified it as far as |
|
I'm also marking this PR 'ready for review' as I'd like to start getting reviews from the Swift team. The REDOS query itself can't be PR'd until we have at least a basic version of this library in place. |
…ore like Ruby does them."
|
The above commit "Swift: Do regex locations more like Ruby does them." unexpectedly fixes the timeout issue. I wasn't aware |
This code is why that matters: codeql/shared/regex/codeql/regex/nfa/NfaUtils.qll Lines 147 to 173 in 5afdaf8
It selects a canonical representative based on locations (and |
|
Thanks for the explanation. |
MathiasVP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments (mostly for my own understanding). If @erik-krogh is happy with the code then so am I 👍.
| * Regex("(a|b).*").firstMatch(in: myString) | ||
| * ``` | ||
| */ | ||
| abstract class RegexEval extends CallExpr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: Is this meant to be extended by users if they want to model custom regex evaluations, or should we only expose a final view of this so that users can't extend it outside this file?
I suppose we shouldn't extend this class in our own queries since this would cause a bunch of re-evaluation (because I assume a bunch of things in the regex library is cached?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. It isn't intended, but perhaps it should be. Do you think anything should change about the design (or perhaps just the documentation) to reflect this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to prevent users (and future us) from accidentally extending the set of regex evaluators we could replace
abstract class RegexEval extends CallExpr { ... }
/* ... */
private class AlwaysRegexEval extends RegexEval { ... }with something like:
// This is now private to prevent users from accidentally extending the domain of the class.
private abstract class RegexEvalDomain extends CallExpr { ... }
// ... and we now expose only a final view of the class.
// This means that users writing `class MyRegexEval extends RegexEval { ... }` don't extend
// the domain of the class, but instead define a subclass (like they most likely intended).
final class RegexEval = RegexEvalDomain;
/* ... */
// And actual classes that need to extend the domain of the RegexEval class can do so by extending `RegexEvalDomain` (which you can only do in this file since the `RegexEvalDomain` class is private).
private class AlwaysRegexEval extends RegexEvalDomain { ... }Note that "final extensions" is a very new QL feature so it won't be available until the next release. So we may want to wait with this change in order to not delay this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting, but to be honest I don't yet see much reason to prevent users from extending the class if they want to.
Perhaps the two variables shouldn't be exposed in the public interface though:
Expr regexInput;
Expr stringInput;
as that does lock us in to that design somewhat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting, but to be honest I don't yet see much reason to prevent users from extending the class if they want to.
Yeah, it's certainly nice that it can be extended. We just need to ensure that none of our own queries in any suite extends this class as it invalidates (at the very least) the dataflow analysis that depends on the set of regex evaluators so that it'll be re-evaluated in the offending query.
Perhaps the two variables shouldn't be exposed in the public interface though:
Expr regexInput; Expr stringInput;as that does lock us in to that design somewhat?
That's true. Since the class is already abstract we could just require the presence of two abstract predicates getRegexInput() and getStringInput().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| not exists(int x, int y | | ||
| this.posixStyleNamedCharacterProperty(x, y, _) and e >= x and e < y | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like something that would be more efficiently expressed using rank, but if this doesn't perform poorly for other languages it's probably fine 👍.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot. I'm reluctant to change this because (1) we don't currently have Swift test coverage for this edge case (you can remove the whole not exists(...) without any tests failing) and (2) I'm not sure what a clean solution with rank would look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just added some new test cases around this stuff. There are a couple of spurious parse failures around posix named character properties, but none of them are affected by this particular edge case (I'm struggling to think what would be). The output of parse.ql LGTM.
I haven't really looked at the implementation, and I'm not that much into regex parsing. |
Co-authored-by: Mathias Vorreiter Pedersen <mathiasvp@github.com>
Well most of the tests pass, there are still a few
|
|
Just merged in main and fixed an utterly trivial merge conflict in |
|
Fixed the check failure (missing QLDoc in shared code I didn't actually touch...). |
MathiasVP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
erik-krogh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big picture looks OK.
But I haven't looked into anything in detail.
Adds a regular expressions library for Swift, consisting of:
Regex.qll, providing aRegexEvalclass for recognizing places where regular expression evaluation takes place, and an extension to theRegExpclass for recognizing string literals that are used as regular expressions (via dataflow to theRegexEvals).RegexTreeView.qllandinternal.ParseRegexported from Ruby (crudely, for now).regexlibraries as well.Things I'd ideally like to address before merging (@erik-krogh I'd really appreciate some advice here):
prefixMatch,firstMatch,wholeMatch...). How is this addressed in other languages?Future work for future PRs (we now have issues tracking all of these):
/.*/).RegexBuilder.options: .regularExpressionis specified as an argument (i.e. model methods ofStringProtocolandNSString+RegexUseConfigflow through theNSStringconstructor).getModeFromPrefixand the Swift test case beginning(?s)).prefixMatch,firstMatch,wholeMatch) properly - see matchesFullString in the Java implementation.MISSING,SPURIOUSandhasParseFailureresults.RegexTreeView.qll: "TODO: Handle named escapes".RegexTreeView.qll: "TODO: expand to cover more properties".hasFreeSpacingFlag)?