Skip to content

ReDoS: add a shared regex pack#11061

Merged
yoff merged 3 commits into
github:mainfrom
erik-krogh:shared-redosMod
Nov 14, 2022
Merged

ReDoS: add a shared regex pack#11061
yoff merged 3 commits into
github:mainfrom
erik-krogh:shared-redosMod

Conversation

@erik-krogh
Copy link
Copy Markdown
Contributor

@erik-krogh erik-krogh commented Nov 1, 2022

This PR introduces a new shared regex pack containing various shared code that analyze regular expressions.
All these analyses depend on a single RegexTreeViewSig signature, that describes a regex as a tree structure.

I don't expect that a shared regex parser will ever make it into this pack.

This is just the shared pack, there are separate PRs that will integrate this shared pack with each language (see bottom).
A complete PR, that combines all the part, can be found here: #10604 (ugly commit history).

I ended up basing locations on hasLocationInfo as not all languages had Location objects for all the regex terms.
This also required some small rewrites in the implementation.

The class hierarchy inside RegexTreeViewSig was needed in order to have some hierarchy that all the languages could agree on, and that's why I had to introduce a Top class.
(RegExpParent has a slightly different hierarchy in JS).


This PR should only be merged when a stable CLI supports all the required features, and when all the major core bugs have been fixed.


I've made separate PRs that port each language to the shared pack:
JavaScript, Ruby, Python, Java.


TODO:

@erik-krogh erik-krogh marked this pull request as ready for review November 1, 2022 11:47
@erik-krogh erik-krogh requested a review from a team November 1, 2022 11:47
@calumgrant calumgrant requested review from aibaars and yoff November 7, 2022 09:37
Copy link
Copy Markdown
Contributor

@aibaars aibaars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I reviewed this by comparing the original files (indented with 2 spaces to match up better) to the new files. The files are mostly identical. The main differences were :

  • changing extends to instanceof
    • adding trivial definitions for some predicates that simply forward to the super class
    • replacing this with super in some cases
  • order by clauses use a location string instead of the individual components.

Comment on lines +124 to +126
string toString() { result = this.(RegExpConstant).toString() }

RegExpTerm getRootTerm() { result = this.(RegExpConstant).getRootTerm() }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we have similar definitions for toString and getRootTerm for the other classes. It feels a bit repetitive, not sure if it can be avoided though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it can be avoided though.

It can't be avoided with the current structure of the code.
it's not possible to extend a parameter in a parameterised module, so I have to use instanceof, which doesn't "forward" the member-predicates.

I might have written the code using less classes (and more predicates) if I had done it as a parameterised module from the start, but with this change I can change as little as possible compared to the existing implementation.

string getTermLocationString(RegExpTerm t) {
exists(string file, int startLine, int startColumn, int endLine, int endColumn |
t.hasLocationInfo(file, startLine, startColumn, endLine, endColumn) and
result = file + ":" + startLine + ":" + startColumn + "-" + endLine + ":" + endColumn
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use an order by clause like order by file, startLine, startColumn, endLine, endColumn instead of creating strings?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change the code because It didn't work out to use getLocation() (not all RegExpTerm has a Location).
The end result of creating a string is the same as a complex order-by, and I didn't want to write a big exists(..) inside the order-by. I thought this approach was cleaner.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really mind, just wondered about potential performance implications of creating those strings and also the ordering of line numbers is a bit funny when using a string "9" > "9"`.

I had expected to see code like

    term =
      min(RelevantRegExpTerm t, string file, int startLine, int startColumn, int endLine, int endColumn |
        str = getCanonicalizationString(t) and t.hasLocationInfo(file, startLine, startColumn, endLine, endColumn)
      |
        t order by file, startLine, startColumn, endLine, endColumn
      )

I don't really mind though. Sorting by string avoids quite a bit of boilerplate variable declarations, haslocationInfo calls and long order by clauses. Feel free to keep things the way they are.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondered about potential performance implications of creating those strings

I suspect that it's slower to create those strings, but I haven't noticed a significant slowdown.

the ordering of line numbers is a bit funny when using a string "9" > "9"`.

Which ordering is used doesn't matter, I just need an arbitrary total order.

So I want to keep it as is.

@erik-krogh
Copy link
Copy Markdown
Contributor Author

The main differences were :

  • changing extends to instanceof

    • adding trivial definitions for some predicates that simply forward to the super class
    • replacing this with super in some cases
  • order by clauses use a location string instead of the individual components.

Yep, that's right 👍

@yoff
Copy link
Copy Markdown
Contributor

yoff commented Nov 14, 2022

The main differences were :

  • changing extends to instanceof

    • adding trivial definitions for some predicates that simply forward to the super class
    • replacing this with super in some cases
  • order by clauses use a location string instead of the individual components.

Yep, that's right 👍

That seems completely fine to me :-)

@yoff yoff merged commit dd525a4 into github:main Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants