Skip to content

C++: Remove CP in cpp/command-line-injection#10123

Merged
jketema merged 1 commit into
github:mainfrom
MathiasVP:optimize-exec-tainted
Aug 22, 2022
Merged

C++: Remove CP in cpp/command-line-injection#10123
jketema merged 1 commit into
github:mainfrom
MathiasVP:optimize-exec-tainted

Conversation

@MathiasVP
Copy link
Copy Markdown
Contributor

Prior to this PR, the sink in ExecTaintConfiguration was a cartesian product between all concatenation-like functions and all shell-command executing functions (😱) This PR introduces another taint tracking configuration that limits the possible flow states to only those that are relevant for a given sink.

Here are some numbers from Samate:

On main:

Tuple counts for DataFlowImpl#9021cc4c::sinkNode#fff/3@1ba010mo after 17.3s:
  48709004  ~6%     {3} r1 = SCAN num#DataFlowImpl#9021cc4c::TNodeNormal#ff OUTPUT In.0, "ExecTaintConfiguration", In.1 'node'
  10846     ~0%     {2} r2 = JOIN r1 WITH ExecTainted#91000ffb::ExecTaintConfiguration::isSink#dispred#cpe#2#f ON FIRST 1 OUTPUT "ExecTaintConfiguration", Lhs.2 'node'
  235097896 ~0%     {3} r3 = JOIN r2 WITH project#ExecTainted#91000ffb::ExecState#fff#3 CARTESIAN PRODUCT OUTPUT "ExecTaintConfiguration", Lhs.1 'node', Rhs.0
  235097896 ~0%     {3} r4 = r3 AND NOT DataFlowImpl#9021cc4c::stateBarrier#fff(Lhs.1 'node', Lhs.2 'state', "ExecTaintConfiguration")
  235097896 ~2%     {3} r5 = SCAN r4 OUTPUT In.1 'node', In.2 'state', "ExecTaintConfiguration"
                    return r5

On this PR:

[2022-08-21 16:34:58] Evaluated non-recursive predicate DataFlowImpl#9021cc4c::sinkNode#fff@aa41362f in 1ms (size: 4188).
Evaluated relational algebra for predicate DataFlowImpl#9021cc4c::sinkNode#fff@aa41362f with tuple counts:
        4188  ~1%    {3} r1 = JOIN num#DataFlowImpl#9021cc4c::TNodeNormal#ff WITH ExecTainted#91000ffb::ExecTaintConfiguration::isSink#dispred#cpe#23#ff ON FIRST 1 OUTPUT "ExecTaintConfigurationAAA", Lhs.1, Rhs.1
        4188  ~1%    {3} r2 = r1 AND NOT DataFlowImpl#9021cc4c::stateBarrier#fff(Lhs.1, Lhs.2, "ExecTaintConfigurationAAA")
        4188  ~0%    {3} r3 = SCAN r2 OUTPUT In.1, In.2, "ExecTaintConfigurationAAA"
                     return r3

@MathiasVP MathiasVP requested a review from a team as a code owner August 21, 2022 15:48
@github-actions github-actions Bot added the C++ label Aug 21, 2022
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Aug 21, 2022
@jketema
Copy link
Copy Markdown
Contributor

jketema commented Aug 21, 2022

I find the original behaviour somewhat unexpected, as that seems to indicate there's no inherent connection between the flow path and its eventual sink, on the one hand, and the flow state, on the other hand. In wonder if this is a restriction of the implementation or an actual bug. It's also not clear to me why this wouldn't have resulted in FPs before this change.

@jketema
Copy link
Copy Markdown
Contributor

jketema commented Aug 21, 2022

Or is this just a consequence of bottom-up evaluation?

@MathiasVP
Copy link
Copy Markdown
Contributor Author

I find the original behaviour somewhat unexpected, as that seems to indicate there's no inherent connection between the flow path and its eventual sink, on the one hand, and the flow state, on the other hand. In wonder if this is a restriction of the implementation or an actual bug. It's also not clear to me why this wouldn't have resulted in FPs before this change.

Or is this just a consequence of bottom-up evaluation?

Yeah, exactly. What currently happens on main is that we first evaluate the giant cartesian product to compute sinkNode#fff and then, at some point during the dataflow recursion, we'll reach a specific ExecState state through the additional taint step (i.e., the concatenation step).

This PR reduces the set of ExecState states initially computed by sinkNode#fff to only the ones that can be reached by the node that receives flow after the concatenation. So no behavior is actually changed (and I think the behavior on main is correct).

Does that make sense?

@jketema
Copy link
Copy Markdown
Contributor

jketema commented Aug 22, 2022

Does that make sense?

Absolutely.

@jketema jketema merged commit 04564b4 into github:main Aug 22, 2022
@geoffw0
Copy link
Copy Markdown
Contributor

geoffw0 commented Aug 22, 2022

I think perhaps this query was trying to be a bit too clever with flow states. From what I understand of what its doing I'd probably have written it with just two, "safe" and "dangerouslyConcatenated". This would mean we get less detail in the violation message, so assuming its robust this fix is probably to be preferred.

@jketema
Copy link
Copy Markdown
Contributor

jketema commented Aug 22, 2022

From what I understand of what its doing I'd probably have written it with just two, "safe" and "dangerouslyConcatenated".

Yeah. I had something like that originally. However, that made it impossible to report anything about the location of the concatenation, which would have been a major regression compared to the two stitched together dataflow configurations that were there previously.

@MathiasVP
Copy link
Copy Markdown
Contributor Author

I do think reporting the location of the concatenation is very helpful. I can imagine that would be a natural place to insert a sanitizer (or whatever people would do to fix the vulnerability).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants