Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared: Add DataFlow::DeduplicatePathGraph #14350

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Oct 2, 2023

Adds a shared parameterised module, DataFlow::DeduplicatePathGraph, for post-processing a PathGraph so that it doesn't result in duplicate alerts or alerts with multiple identical paths.

This issue usually arises from using FlowState, which is embedded in the PathNode but not rendered as part of its string value. This can thus result in paths that have different intermediate flow states but appear to be identical to the end-user.

The issue with multiple alerts, i.e. seemingly-identical rows in the #select clause, is particularly bad for tools that attempt to diff results (such as DCA) but does not perform its own deduplication in advance.

The module works by projecting PathNode down to their (node, toString) values, which is closer to what the end-user actually sees in the end. By seeing the path graph as an NFA that accepts input symbols of type (node, toString) we try to minimise this NFA by merging states.

This is needed by the JavaScript data-flow migration, but I've put this in its own PR so it can be reviewed separately. I've used the library in a Ruby query that had some very ad-hoc alert deduplication logic. Note that the expected output diff is mainly due to reordering of result sets in the output.

@asgerf asgerf marked this pull request as ready for review October 2, 2023 12:27
@asgerf asgerf requested a review from a team as a code owner October 2, 2023 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant