Ruby: add more Array/Enumerable flow summaries#7614
Conversation
57878ae to
8310a51
Compare
525a60d to
dbcc700
Compare
|
I've added a commit to enable taint-flow checks in the array-flow test, but it fails with this unexpected result, that I am unable to explain: |
5ffc9a0 to
8248a94
Compare
hmac
left a comment
There was a problem hiding this comment.
This is a mammoth piece of work @nickrolfe! I'm impressed and also a bit terrified 😅
I've reviewed most of the Array summaries. I'll take a look at Enumerable tomorrow.
I spent a bit of time trying to find the source of that spurious taint result, but got nowhere I'm afraid.
| a.cycle(2) do |x| | ||
| sink x # $ hasValueFlow=29 | ||
| a.combination(1) do |x| | ||
| sink(x[0]) # $ hasValueFlow=29 |
There was a problem hiding this comment.
combination returns an array of combinations, so it would be good to test that too:
b = a.combination(1)
sink(b[i][j]) # $ hasValueFlow=29There was a problem hiding this comment.
Are you sure? It looks like it returns either an enumerator or self:
[14] pry(main)> a
=> [0, 10, 20, 30]
[15] pry(main)> a.combination(1)
=> #<Enumerator: ...>
[16] pry(main)> a.combination(1) { |x| }
=> [0, 10, 20, 30]
I've updated the test to test the return value according to that behaviour.
There was a problem hiding this comment.
Sorry, I was sloppy with my description here. Without a block it returns an enumerator, but the enumerator's elements are combinations, i.e.
pry(main)> [1,2,3].combination(2).to_a
=> [[1, 2], [1, 3], [2, 3]]So it's the same deal as with the others, where we might want to pretend the enumerator is an array. A quick search indicates it's common to do things like foo.combination(n).to_a or foo.combination(n).each, so we'd need ArrayElement of Receiver -> ArrayElement[?] of ArrayElement[?] of ReturnValue to track flow through these expressions.
| or | ||
| exists(ArrayIndex elementIndex, int argIndex | | ||
| argIndex in [0 .. mc.getNumberOfArguments() - 1] and | ||
| elementIndex = mc.getArgument(argIndex).getConstantValue().getInt() |
There was a problem hiding this comment.
Out of interest, how does this line differ from this alternative?
elementIndex = getKnownArrayElementContent(mc.getArgument(argIndex))There was a problem hiding this comment.
I'm not actually sure. I noticed Tom's use of getKnownArrayElementContent at some point, and started using it, but didn't go back to this one. @hvitved is there any difference between them?
There was a problem hiding this comment.
No, I think that should be the same
hmac
left a comment
There was a problem hiding this comment.
Pretty much reviewed everything now. I think there's a few missing flow steps and some other super minor things but otherwise it looks 👌
| input = "ArrayElement of Receiver" and | ||
| output = "Parameter[0] of BlockArgument" and | ||
| preservesValue = true | ||
| } |
There was a problem hiding this comment.
Since
[:a,:taint1,:b].chunk{|x|x == :a ? :taint2 : x}.to_a
=> [[:taint2, [:a]], [:taint1, [:taint1]], [:b, [:b]]]I think we also want this flow:
ArrayElement of Receiver->ArrayElement[?] of ArrayElement[1] of ArrayElement[?] of ReturnValueReturnValue of BlockArgument->ArrayElement[0] of ArrayElement[?] of ReturnValue
There was a problem hiding this comment.
chunk returns an Enumerator rather than an array, but I think we still want to preserve knowledge of its contents as if it were an array, right?
There was a problem hiding this comment.
Like I mentioned in another comment, I didn't implement flow into enumerator return values for any of these methods. I effectively ignored the versions of methods that return enumerators. It's not great, but I'm also not convinced that treating the enumerator as if it were an array is right, either. 🤷
| override predicate propagatesFlowExt(string input, string output, boolean preservesValue) { | ||
| input = "ArrayElement of Receiver" and | ||
| output = ["Parameter[0] of BlockArgument", "Parameter[1] of BlockArgument"] and | ||
| preservesValue = true |
There was a problem hiding this comment.
Similarly here, I think we want
ArrayElement of Receiver -> ArrayElement[?] of ArrayElement[?] of ReturnValue
| or | ||
| input = "ReturnValue of BlockArgument" and | ||
| output = "ArrayElement[?] of ReturnValue" and | ||
| preservesValue = true |
There was a problem hiding this comment.
When no block is given to map it behaves like itself except wraps the array in an Enumerator. So we could have a summary for CollectWithoutBlockArgument which has flow from Receiver to ReturnValue. This is a real edge case though, so maybe not worth bothering with.
| override predicate propagatesFlowExt(string input, string output, boolean preservesValue) { | ||
| input = "ArrayElement of Receiver" and | ||
| output = "ArrayElement[?] of Parameter[0] of BlockArgument" and | ||
| preservesValue = true |
There was a problem hiding this comment.
When a block argument is given, each_cons returns the receiver
[1,2,3].each_cons(2) { |x| 9 } == [1,2,3]When no block is given, each_cons returns an enumerator whose elements are the successive n-tuples
[1,2,3].each_cons(2).to_a == [[1,2], [2,3]]So we may want this extra flow:
ArrayElement of Receiver->ArrayElement[?] of ReturnValueArrayElement of Receiver->ArrayElement[?] of ArrayElement[?] of ReturnValue
This is expected, and it happens because |
hvitved
left a comment
There was a problem hiding this comment.
Very impressive and thorough work Nick. I have only spot-checked a few summaries, and they looked fine.
nickrolfe
left a comment
There was a problem hiding this comment.
I've addressed a lot of the feedback. Thanks @hmac and @hvitved!
To summarise the remaining issues:
- So far, the summaries just kind of ignore the methods (or variants) that return enumerators. @hmac is effectively proposing that we treat them as if they were arrays. I'm not convinced.
- Side note: a lot of the methods return
selfif given a block, and an enumerator otherwise. We're currently ignoring the latter variants, but I think the summaries we do have for the block variants should be restricted withexists(mc.getBlock()).
- Side note: a lot of the methods return
- I couldn't make @hvitved's suggested changes to use
SimpleSummarizedCallablewithout introducing non-monotonic recursion errors. Is there a better way to avoid those?
| a.cycle(2) do |x| | ||
| sink x # $ hasValueFlow=29 | ||
| a.combination(1) do |x| | ||
| sink(x[0]) # $ hasValueFlow=29 |
There was a problem hiding this comment.
Are you sure? It looks like it returns either an enumerator or self:
[14] pry(main)> a
=> [0, 10, 20, 30]
[15] pry(main)> a.combination(1)
=> #<Enumerator: ...>
[16] pry(main)> a.combination(1) { |x| }
=> [0, 10, 20, 30]
I've updated the test to test the return value according to that behaviour.
| a = [0, 1, source(34)] | ||
| a.cycle(2) do |x| | ||
| sink x # $ hasValueFlow=34 | ||
| end |
There was a problem hiding this comment.
When called without a block, it returns an enumerator. Your suggested addition wouldn't work:
[26] pry(main)> b = a.cycle(2)
=> #<Enumerator: ...>
[27] pry(main)> b[0]
NoMethodError: undefined method `[]' for #<Enumerator: [0, 10, 20, 30]:cycle(2)>
As far as I know, we're not tracking flow into enumerator return values for any of these methods. So, if we made it b = a.cycle(2).to_a, I don't think we are able to track that flow.
| or | ||
| exists(ArrayIndex elementIndex, int argIndex | | ||
| argIndex in [0 .. mc.getNumberOfArguments() - 1] and | ||
| elementIndex = mc.getArgument(argIndex).getConstantValue().getInt() |
There was a problem hiding this comment.
I'm not actually sure. I noticed Tom's use of getKnownArrayElementContent at some point, and started using it, but didn't go back to this one. @hvitved is there any difference between them?
| input = "ArrayElement of Receiver" and | ||
| output = "Parameter[0] of BlockArgument" and | ||
| preservesValue = true | ||
| } |
There was a problem hiding this comment.
Like I mentioned in another comment, I didn't implement flow into enumerator return values for any of these methods. I effectively ignored the versions of methods that return enumerators. It's not great, but I'm also not convinced that treating the enumerator as if it were an array is right, either. 🤷
| exists(ArrayIndex i | | ||
| input = "ArrayElement[" + i + "] of Receiver" and | ||
| output = "ArrayElement[" + i + "] of ReturnValue" | ||
| ) |
|
Oh, I see I missed some of Tom's comments, because they were made on an earlier commit, before I moved everything to a new file. Edit: now addressed. |
Even when summaries are shared in a single class.
|
I think the only remaining question mark is what we do about the methods that return enumerators. My argument is we should track flow through them as if they were arrays, so that expressions like |
|
+1 for treating enumerables as arrays, or more specifically: treating We could also use an access path token |
|
Does the enumerable stuff need to be in this PR though? I'd would simplify the access path rewrite if this PR gets merged ASAP. |
|
I think the changes here are good enough that we could merge, to let you do your rewrite, and I could handle the rest in a followup PR after that. However, there were two questions for @hvitved that I would not like to be missed:
|
It will be renamed as part of my work in implementing flow through hashes. |
D'oh, my comments were still pending.
No.
That would allow flow in this case, which we don't want: x = "tainted"
y = x.each_slice
sink(x) |
Sorry, this is really big, and it might be too tedious to review all the changes.