Data flow: Refactoring + performance improvements#2903
Conversation
7a3f125 to
bdeebc0
Compare
6ebea81 to
17d491d
Compare
|
Differences jobs (all internal links): Analysis times on Mono and JDK have both been reduced by around 7 %. |
- Introduce `ReadTaintNode` and `TaintStoreNode` to simplify logic for taint getters and taint setters, respectively. - `nodeCandFwd2`: Restrict `stored` column after a read, based on what it might be before a store of the same field. - `nodeCand2`: Restrict `read` column (renamed from `stored`) after a store, based on what it might be after a read of the same field. - Move big step predicates into a `LocalFlowBigStep` module. - Define predicates by dispatch in `AccessPath[Front]` class. - `flowCandFwd0`: Restrict `apf` column after a read, as it should be able to match a Boolean `read` column from `nodeCand2`. - `flowFwd0`: Restrict columns `ap` and `apf` after a read, by introducing a `flowConsCandFwd` predicate (similar to what is done in the previous pruning steps). - `flowFwd0`: Restrict columns `ap` and `apf` after a store, by introducing a `flowConsCand` predicate (similar to what is done in the previous pruning steps).
17d491d to
f935f5e
Compare
|
The C++ dist builds appear to be within the noise threshold, so no objections from me. |
|
A few removed results on JDK. |
aschackmull
left a comment
There was a problem hiding this comment.
Some initial comments; I haven't read all the way through yet.
| } | ||
|
|
||
| pragma[noinline] | ||
| private predicate readStoreNode( |
There was a problem hiding this comment.
This could use some qldoc.
| nodeCand1(arg, config) and | ||
| readStoreNode(call, arg, f1, config) and | ||
| readStoreCand1(f1, unbind(config)) |
There was a problem hiding this comment.
Any reason not to fold the remaining nodeCand1 and readStoreCand1 lines into readStoreNode?
There was a problem hiding this comment.
Because that would give a bad join-order. readStoreNode is only there to give a proper join-order.
There was a problem hiding this comment.
Also using unbind? Ignoring configs (which unbind allows us to do) then all the joins are unary filters, so I don't see how that could be the case.
There was a problem hiding this comment.
Using unbind, i.e.
TReadStoreNode(DataFlowCall call, ArgumentNode arg, Content f1, Configuration config) {
exists(Content f2, Node out |
nodeCand1(arg, config) and
argumentValueFlowsThrough(call, arg, TContentSome(f1), TContentSome(f2), out) and
nodeCand1(out, unbind(config)) and
readStoreCand1(f1, unbind(config)) and
readStoreCand1(f2, unbind(config))
)
}
yields this pipeline:
[2020-03-18 13:01:44] (121s) Tuple counts for dom#DataFlowImpl::TReadStoreNode#ffff:
20715 ~0% {3} r1 = JOIN DataFlowImpl::readStoreCand1#ff AS L WITH DataFlowImplCommon::TContentSome#ff AS R ON FIRST 1 OUTPUT L.<1>, L.<0>, R.<1>
429111225 ~0% {4} r2 = JOIN r1 WITH DataFlowImpl::readStoreCand1#ff_10#join_rhs AS R ON FIRST 1 OUTPUT R.<1>, r1.<1>, r1.<0>, r1.<2>
429111225 ~0% {4} r3 = JOIN r2 WITH DataFlowImplCommon::TContentSome#ff AS R ON FIRST 1 OUTPUT r2.<3>, R.<1>, r2.<1>, r2.<2>
235878 ~45% {5} r4 = JOIN r3 WITH DataFlowImplCommon::Cached::FlowThrough::Final::argumentValueFlowsThrough#fffff#origOrder_23014#join_rhs AS R ON FIRST 2 OUTPUT R.<3>, r3.<2>, r3.<3>, R.<2>, R.<4>
198355 ~40% {6} r5 = JOIN r4 WITH DataFlowImpl::nodeCand1#ff AS R ON FIRST 1 OUTPUT r4.<1>, r4.<2>, r4.<3>, r4.<0>, r4.<4>, R.<1>
198355 ~40% {6} r6 = SELECT r5 ON r5.<1> >= r5.<5>
198355 ~40% {6} r7 = SELECT r6 ON r6.<1> <= r6.<5>
198355 ~46% {5} r8 = SCAN r7 OUTPUT r7.<4>, r7.<0>, r7.<2>, r7.<3>, r7.<5>
160468 ~57% {6} r9 = JOIN r8 WITH DataFlowImpl::nodeCand1#ff AS R ON FIRST 1 OUTPUT r8.<1>, r8.<2>, r8.<3>, r8.<0>, r8.<4>, R.<1>
160468 ~57% {6} r10 = SELECT r9 ON r9.<5> >= r9.<4>
160468 ~57% {6} r11 = SELECT r10 ON r10.<5> <= r10.<4>
160468 ~52% {4} r12 = SCAN r11 OUTPUT r11.<1>, r11.<2>, r11.<0>, r11.<4>
There was a problem hiding this comment.
That can't be right, that RA snippet cannot match that QL snippet. It's clearly joining on configuration and there's only two unbind in the RA vs 3 in the QL.
There was a problem hiding this comment.
I agree that it looks weird, but I just double checked, and the RA above is indeed what I get.
There was a problem hiding this comment.
But that RA cannot have come from that QL, so something is mismatched somewhere.
There was a problem hiding this comment.
Tried it again, same result. Would you mind trying?
aschackmull
left a comment
There was a problem hiding this comment.
A few more comments, still not finished reading through.
aschackmull
left a comment
There was a problem hiding this comment.
A few more comments.
|
Here are the updated |
|
The removed ZipSlip result looks like a FP, which is now successfully pruned by the additional cons candidate tracking. |
|
Let's leave the remaining join-order weirdness as-is, then we can deal with it once the cause is known. |
Highlights:
ReadTaintNodeandTaintStoreNodeto simplify logic for taintgetters and taint setters, respectively.
nodeCandFwd2: Restrictstoredcolumn after a read, based on what it mightbe before a store of the same field.
nodeCand2: Restrictreadcolumn (renamed fromstored) after a store, basedon what it might be after a read of the same field.
LocalFlowBigStepmodule.AccessPath[Front]class.flowCandFwd0: Restrictapfcolumn after a read, as it should be able to matcha Boolean
readcolumn fromnodeCand2.flowFwd0: Restrict columnsapandapfafter a read, by introducing aflowConsCandFwdpredicate (similar to what is done in the previous pruning steps).flowFwd0: Restrict columnsapandapfafter a store, by introducing aflowConsCandpredicate (similar to what is done in the previous pruning steps).