Shared: Add DataFlow::DeduplicatePathGraph #14350

asgerf · 2023-10-02T09:42:19Z

Adds a shared parameterised module, DataFlow::DeduplicatePathGraph, for post-processing a PathGraph so that it doesn't result in duplicate alerts or alerts with multiple identical paths.

This issue usually arises from using FlowState, which is embedded in the PathNode but not rendered as part of its string value. This can thus result in paths that have different intermediate flow states but appear to be identical to the end-user.

The issue with multiple alerts, i.e. seemingly-identical rows in the #select clause, is particularly bad for tools that attempt to diff results (such as DCA) but does not perform its own deduplication in advance.

The module works by projecting PathNode down to their (node, toString) values, which is closer to what the end-user actually sees in the end. By seeing the path graph as an NFA that accepts input symbols of type (node, toString) we try to minimise this NFA by merging states.

This is needed by the JavaScript data-flow migration, but I've put this in its own PR so it can be reviewed separately. I've used the library in a Ruby query that had some very ad-hoc alert deduplication logic. Note that the expected output diff is mainly due to reordering of result sets in the output.

asgerf added 3 commits October 2, 2023 11:19

Shared: Add DataFlow::DeduplicatePathGraph

636b8b7

Shared: change note

26f15e0

Ruby: use DeduplicatePathGraph in CodeInjection query

bc0ed45

github-actions bot added documentation Ruby DataFlow Library labels Oct 2, 2023

asgerf marked this pull request as ready for review October 2, 2023 12:27

asgerf requested a review from a team as a code owner October 2, 2023 12:27

Shared: Add DataFlow::DeduplicatePathGraph #14350

Are you sure you want to change the base?

Shared: Add DataFlow::DeduplicatePathGraph #14350

Conversation

asgerf commented Oct 2, 2023 • edited

asgerf commented Oct 2, 2023 •

edited