ReDoS: add a shared regex pack#11061
Conversation
a9e0c80 to
f5daee2
Compare
aibaars
left a comment
There was a problem hiding this comment.
This looks good to me. I reviewed this by comparing the original files (indented with 2 spaces to match up better) to the new files. The files are mostly identical. The main differences were :
- changing
extendstoinstanceof- adding trivial definitions for some predicates that simply forward to the super class
- replacing
thiswithsuperin some cases
- order by clauses use a location string instead of the individual components.
| string toString() { result = this.(RegExpConstant).toString() } | ||
|
|
||
| RegExpTerm getRootTerm() { result = this.(RegExpConstant).getRootTerm() } |
There was a problem hiding this comment.
I see we have similar definitions for toString and getRootTerm for the other classes. It feels a bit repetitive, not sure if it can be avoided though.
There was a problem hiding this comment.
not sure if it can be avoided though.
It can't be avoided with the current structure of the code.
it's not possible to extend a parameter in a parameterised module, so I have to use instanceof, which doesn't "forward" the member-predicates.
I might have written the code using less classes (and more predicates) if I had done it as a parameterised module from the start, but with this change I can change as little as possible compared to the existing implementation.
| string getTermLocationString(RegExpTerm t) { | ||
| exists(string file, int startLine, int startColumn, int endLine, int endColumn | | ||
| t.hasLocationInfo(file, startLine, startColumn, endLine, endColumn) and | ||
| result = file + ":" + startLine + ":" + startColumn + "-" + endLine + ":" + endColumn |
There was a problem hiding this comment.
Why not use an order by clause like order by file, startLine, startColumn, endLine, endColumn instead of creating strings?
There was a problem hiding this comment.
I had to change the code because It didn't work out to use getLocation() (not all RegExpTerm has a Location).
The end result of creating a string is the same as a complex order-by, and I didn't want to write a big exists(..) inside the order-by. I thought this approach was cleaner.
There was a problem hiding this comment.
I don't really mind, just wondered about potential performance implications of creating those strings and also the ordering of line numbers is a bit funny when using a string "9" > "9"`.
I had expected to see code like
term =
min(RelevantRegExpTerm t, string file, int startLine, int startColumn, int endLine, int endColumn |
str = getCanonicalizationString(t) and t.hasLocationInfo(file, startLine, startColumn, endLine, endColumn)
|
t order by file, startLine, startColumn, endLine, endColumn
)I don't really mind though. Sorting by string avoids quite a bit of boilerplate variable declarations, haslocationInfo calls and long order by clauses. Feel free to keep things the way they are.
There was a problem hiding this comment.
just wondered about potential performance implications of creating those strings
I suspect that it's slower to create those strings, but I haven't noticed a significant slowdown.
the ordering of line numbers is a bit funny when using a string "9" > "9"`.
Which ordering is used doesn't matter, I just need an arbitrary total order.
So I want to keep it as is.
Yep, that's right 👍 |
That seems completely fine to me :-) |
This PR introduces a new shared
regexpack containing various shared code that analyze regular expressions.All these analyses depend on a single
RegexTreeViewSigsignature, that describes a regex as a tree structure.I don't expect that a shared regex parser will ever make it into this pack.
This is just the shared pack, there are separate PRs that will integrate this shared pack with each language (see bottom).
A complete PR, that combines all the part, can be found here: #10604 (ugly commit history).
I ended up basing locations on
hasLocationInfoas not all languages hadLocationobjects for all the regex terms.This also required some small rewrites in the implementation.
The class hierarchy inside
RegexTreeViewSigwas needed in order to have some hierarchy that all the languages could agree on, and that's why I had to introduce aTopclass.(
RegExpParenthas a slightly different hierarchy in JS).This PR should only be merged when a stable CLI supports all the required features, and when all the major core bugs have been fixed.
I've made separate PRs that port each language to the shared pack:
JavaScript, Ruby, Python, Java.
TODO:
${workspace}for intra-workspace dependencies #11004