Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
60dc1af
Python: prepare to promote NoSqlInjection
yoff Aug 15, 2023
55707d3
Python: Make things compile in their new location
yoff Aug 15, 2023
db04597
Python: rename file
yoff Aug 23, 2023
087961d
Python: Refactor to allow customizations
yoff Aug 23, 2023
19046ea
Python: more renames
yoff Aug 23, 2023
bf8bfd9
Python: Add inline query test
yoff Sep 7, 2023
114984b
Python: Added tests based on security analysis
yoff Aug 29, 2023
c0b3245
Python: Enrich the NoSql concept
yoff Aug 31, 2023
7edebbe
Python: Add QLDocs
yoff Sep 6, 2023
f253f97
Python: update test expectations
yoff Sep 6, 2023
970e881
Python: Follow naming convention
yoff Sep 7, 2023
b07d085
Python: make test PoC a proper package
yoff Sep 7, 2023
d91cd21
Python: rename file
yoff Sep 8, 2023
154a369
Python: Add test for function
yoff Sep 11, 2023
d9f63e1
Python: Split modelling of query operators
yoff Sep 11, 2023
a063d7d
Python: sinks -> decodings
yoff Sep 11, 2023
4614b1a
Python: add change note
yoff Sep 18, 2023
5611bda
Python: add test for `$accumulator`
yoff Sep 19, 2023
30c37ca
Python: model `§accumulator`
yoff Sep 19, 2023
7c085ec
Python: Add test for `map_reduce`
yoff Sep 20, 2023
4ec8b3f
Python: Model `map_reduce`
yoff Sep 20, 2023
12dab88
Python: rename concept
yoff Sep 20, 2023
8156fa9
Apply naming suggestions from code review
yoff Sep 28, 2023
37a4f35
Python: further rename
yoff Sep 28, 2023
3fb579e
Python: add test for type tracking
yoff Sep 28, 2023
d90630a
Python: fix query file
yoff Sep 28, 2023
c2b6383
Apply suggestions from code review
yoff Sep 28, 2023
9682c82
Python: rename file
yoff Sep 28, 2023
2a739b3
Python: rename module
yoff Sep 28, 2023
eb1be08
Python: split modelling
yoff Sep 28, 2023
2a7b593
Python: Fix QL alerts
yoff Sep 28, 2023
a8e0023
Python: forgot to list framework
yoff Sep 28, 2023
d5b64c5
Python: update test expectations
yoff Sep 28, 2023
3043633
Python: Some renaming of flow states
yoff Sep 28, 2023
2e028a4
Apply suggestions from code review
yoff Sep 29, 2023
74d6f37
Python: update meta query `TaintSinks`
yoff Sep 29, 2023
2d845e3
Python: nicer paths
yoff Sep 29, 2023
e170805
Python: fix QL alert
yoff Sep 29, 2023
f3a0161
Python: rename flow states
yoff Sep 29, 2023
9769668
Python: require dict sinks be dangerous.
yoff Sep 29, 2023
16e1a00
Python: NoSQLInjection -> NoSqlInjection
RasmusWL Sep 29, 2023
d7ad5a0
Python: List NoSQL injection sinks
RasmusWL Sep 29, 2023
3676262
Python: Clean trailing whitespace
RasmusWL Sep 29, 2023
d6d13f8
Python: -> NoSQL in QLDocs
RasmusWL Sep 29, 2023
9b73bbf
Python: Add keyword argument support
RasmusWL Sep 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions python/ql/lib/semmle/python/Concepts.qll
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,68 @@ module SqlExecution {
}
}

/** Provides a class for modeling NoSql execution APIs. */
Comment thread
yoff marked this conversation as resolved.
Outdated
module NoSqlQuery {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider naming this NoSqlExecution to match with our current SqlExecution concept? I don't disagree with the naming per-se, but I'm thinking consistency might be better here 😊

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just kept the name from the submission, but I agree that I should probably change it 👍

/**
* A data-flow node that executes NoSQL queries.
*
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `NoSQLQuery` instead.
Comment thread
yoff marked this conversation as resolved.
Outdated
*/
abstract class Range extends DataFlow::Node {
/** Gets the argument that specifies the NoSql query to be executed. */
abstract DataFlow::Node getQuery();

/** Holds if this query will unpack/interpret a dictionary */
abstract predicate interpretsDict();

/** Holds if this query can be dangerous when run on a user-controlled string */
abstract predicate vulnerableToStrings();
}
}

/**
* A data-flow node that executes NoSQL queries.
*
* Extend this class to refine existing API models. If you want to model new APIs,
* extend `NoSQLQuery::Range` instead.
Comment thread
yoff marked this conversation as resolved.
Outdated
*/
class NoSqlQuery extends DataFlow::Node instanceof NoSqlQuery::Range {
/** Gets the argument that specifies the NoSql query to be executed. */
DataFlow::Node getQuery() { result = super.getQuery() }

/** Holds if this query will unpack/interpret a dictionary */
predicate interpretsDict() { super.interpretsDict() }

/** Holds if this query can be dangerous when run on a user-controlled string */
predicate vulnerableToStrings() { super.vulnerableToStrings() }
}

/** Provides classes for modeling NoSql sanitization-related APIs. */
module NoSqlSanitizer {
/**
* A data-flow node that collects functions sanitizing NoSQL queries.
*
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `NoSQLSanitizer` instead.
*/
abstract class Range extends DataFlow::Node {
/** Gets the argument that specifies the NoSql query to be sanitized. */
abstract DataFlow::Node getAnInput();
}
}

/**
* A data-flow node that collects functions sanitizing NoSQL queries.
*
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `NoSQLSanitizer::Range` instead.
*/
class NoSqlSanitizer extends DataFlow::Node instanceof NoSqlSanitizer::Range {
/** Gets the argument that specifies the NoSql query to be sanitized. */
DataFlow::Node getAnInput() { result = super.getAnInput() }
}

/**
* A data-flow node that executes a regular expression.
*
Expand Down
1 change: 1 addition & 0 deletions python/ql/lib/semmle/python/Frameworks.qll
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ private import semmle.python.frameworks.MarkupSafe
private import semmle.python.frameworks.Multidict
private import semmle.python.frameworks.Mysql
private import semmle.python.frameworks.MySQLdb
private import semmle.python.frameworks.NoSQL
private import semmle.python.frameworks.Oracledb
private import semmle.python.frameworks.Peewee
private import semmle.python.frameworks.Phoenixdb
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ private import python
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.TaintTracking
private import semmle.python.dataflow.new.RemoteFlowSources
private import experimental.semmle.python.Concepts
private import semmle.python.Concepts
private import semmle.python.ApiGraphs

private module NoSql {
Expand Down Expand Up @@ -91,16 +91,17 @@ private module NoSql {
result = mongoDBInstance().getMember(["get_collection", "create_collection"]).getReturn()
}

/** This class represents names of find_* relevant `Mongo` collection-level operation methods. */
private class MongoCollectionMethodNames extends string {
MongoCollectionMethodNames() {
this in [
"find", "find_raw_batches", "find_one", "find_one_and_delete", "find_and_modify",
"find_one_and_replace", "find_one_and_update", "find_one_or_404"
]
}
/** Gets the name of a find_* relevant `Mongo` collection-level operation method. */
private string mongoCollectionMethodName() {
result in [
"find", "find_raw_batches", "find_one", "find_one_and_delete", "find_and_modify",
"find_one_and_replace", "find_one_and_update", "find_one_or_404"
]
}

/** Gets the name of a mongo query operator that will interpret JavaScript. */
private string mongoQueryOperator() { result in ["$where", "$function"] }

/**
* Gets a reference to a `Mongo` collection method call
*
Expand All @@ -114,10 +115,29 @@ private module NoSql {
*/
private class MongoCollectionCall extends DataFlow::CallCfgNode, NoSqlQuery::Range {
MongoCollectionCall() {
this = mongoCollection().getMember(any(MongoCollectionMethodNames m)).getACall()
this = mongoCollection().getMember(mongoCollectionMethodName()).getACall()
}

override DataFlow::Node getQuery() { result = this.getArg(0) }

override predicate interpretsDict() { any() }

override predicate vulnerableToStrings() { none() }
}

private class MongoCollectionQueryOperator extends API::CallNode, NoSqlQuery::Range {
DataFlow::Node query;

MongoCollectionQueryOperator() {
this = mongoCollection().getMember(mongoCollectionMethodName()).getACall() and
query = this.getParameter(0).getSubscript(mongoQueryOperator()).asSink()
}

override DataFlow::Node getQuery() { result = query }

override predicate interpretsDict() { none() }

override predicate vulnerableToStrings() { any() }
}

/**
Expand Down Expand Up @@ -146,6 +166,10 @@ private module NoSql {
}

override DataFlow::Node getQuery() { result = this.getArgByName(_) }

override predicate interpretsDict() { any() }

override predicate vulnerableToStrings() { none() }
}

/** Gets a reference to `mongosanitizer.sanitizer.sanitize` */
Expand Down Expand Up @@ -176,4 +200,23 @@ private module NoSql {

override DataFlow::Node getAnInput() { result = this.getArg(0) }
}

/**
* An equality operator can protect against dictionary interpretation.
* For instance, in `{'password': {"$eq": password} }`, if a dictionary is injected into
* `password`, it will not match.
*/
private class EqualityOperator extends DataFlow::Node, NoSqlSanitizer::Range {
EqualityOperator() {
this =
mongoCollection()
.getMember(mongoCollectionMethodName())
.getParameter(0)
.getASubscript*()
.getSubscript("$eq")
.asSink()
}

override DataFlow::Node getAnInput() { result = this }
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
/**
* Provides default sources, sinks and sanitizers for detecting
* "NoSql injection"
* vulnerabilities, as well as extension points for adding your own.
*/

import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.RemoteFlowSources
import semmle.python.Concepts

/**
* Provides default sources, sinks and sanitizers for detecting
* "NoSql injection"
* vulnerabilities, as well as extension points for adding your own.
*/
module NoSqlInjection {
private newtype TFlowState =
TStringInput() or
TDictInput()

/** A flow state, tracking the structure of the input. */
abstract class FlowState extends TFlowState {
/** Gets a textual representation of this element. */
abstract string toString();
}

/** A state where input is only a string. */
class StringInput extends FlowState, TStringInput {
override string toString() { result = "StringInput" }
}

/** A state where input is a dictionary. */
class DictInput extends FlowState, TDictInput {
override string toString() { result = "DictInput" }
}

/** A source allowing string inputs. */
abstract class StringSource extends DataFlow::Node { }

/** A source allowing dictionary inputs. */
abstract class DictSource extends DataFlow::Node { }

/** A sink vulnerable to user controlled strings. */
abstract class StringSink extends DataFlow::Node { }

/** A sink vulnerable to user controlled dictionaries. */
abstract class DictSink extends DataFlow::Node { }

/** A data flow node where a string is converted into a dictionary. */
abstract class StringToDictConversion extends DataFlow::Node {
/** Gets the argument that specifies the string to be converted. */
abstract DataFlow::Node getAnInput();

/** Gets the resulting dictionary. */
abstract DataFlow::Node getOutput();
}

/** A remote flow source considered a source of user controlled strings. */
class RemoteFlowSourceAsStringSource extends RemoteFlowSource, StringSource { }

/** A NoSQL query that is vulnerable to user controlled strings. */
class NoSqlQueryAsStringSink extends StringSink {
NoSqlQueryAsStringSink() {
exists(NoSqlQuery noSqlQuery | this = noSqlQuery.getQuery() |
noSqlQuery.vulnerableToStrings()
)
}
}

/** A NoSQL query that is vulnerable to user controlled dictionaries. */
class NoSqlQueryAsDictSink extends DictSink {
NoSqlQueryAsDictSink() { this = any(NoSqlQuery noSqlQuery).getQuery() }
}

/** A JSON decoding converts a string to a dictionary. */
class JsonDecoding extends Decoding, StringToDictConversion {
JsonDecoding() { this.getFormat() = "JSON" }

override DataFlow::Node getAnInput() { result = Decoding.super.getAnInput() }

override DataFlow::Node getOutput() { result = Decoding.super.getOutput() }
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
/**
* Provides a taint-tracking configuration for detecting NoSQL injection vulnerabilities
*/

import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.Concepts
private import NoSQLInjectionCustomizations::NoSqlInjection as C
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this as C pattern come from? I have not seen it before, and would have thought we could do without it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to communicate the FlowState class from the imported module C to the declared module NoSqlInjectionConfig. Otherwise, we should rename the FlowState class inside C, but that seems a bit wrong..

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. C for customizations?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was the thinking. I guess it is a little too short to convey anything..


/**
* A taint-tracking configuration for detecting NoSQL injection vulnerabilities.
*/
module NoSQLInjectionConfig implements DataFlow::StateConfigSig {
Comment thread Fixed
Comment thread
yoff marked this conversation as resolved.
Outdated
class FlowState = C::FlowState;

predicate isSource(DataFlow::Node source, FlowState state) {
source instanceof C::StringSource and
state instanceof C::StringInput
or
source instanceof C::DictSource and
state instanceof C::DictInput
}

predicate isSink(DataFlow::Node source, FlowState state) {
source instanceof C::StringSink and
(
state instanceof C::StringInput
or
// since dictionaries can encode strings
state instanceof C::DictInput
)
or
source instanceof C::DictSink and
state instanceof C::DictInput
}

predicate isBarrier(DataFlow::Node node, FlowState state) {
// Block `StringInput` paths here, since they change state to `DictInput`
exists(C::StringToDictConversion c | node = c.getOutput()) and
state instanceof C::StringInput
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit strange to block state StringInput, only to include DictInput for StringSinks in the sinks... could we remove the barrier instead?

Suggested change
predicate isSink(DataFlow::Node source, FlowState state) {
source instanceof C::StringSink and
(
state instanceof C::StringInput
or
// since dictionaries can encode strings
state instanceof C::DictInput
)
or
source instanceof C::DictSink and
state instanceof C::DictInput
}
predicate isBarrier(DataFlow::Node node, FlowState state) {
// Block `StringInput` paths here, since they change state to `DictInput`
exists(C::StringToDictConversion c | node = c.getOutput()) and
state instanceof C::StringInput
}
predicate isSink(DataFlow::Node source, FlowState state) {
source instanceof C::StringSink and
state instanceof C::StringInput
or
source instanceof C::DictSink and
state instanceof C::DictInput
}
predicate isBarrier(DataFlow::Node node, FlowState state) {
// NOTE: not blocking `StringInput` when converted to Dictionary, since
// `StringToDictConversion` also covers things like parsing JSON objects,
// where you don't actually know if it's a string or a dictionary that is
// returned.
}

Otherwise, we might consider renaming DictInput or StringOrDict, to highlight it could be either?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a fair point. A reformulation of the modelling might indeed make things simpler..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the names of the flow states and the comments. I did not actually change the above code, in the end:

  • I believe we still need the second case in isSink, since we may have a path that starts in a DictSink and ends in a StringSink without going through any conversion. Your suggestion would remove that path.
  • I believe the barrier, while possibly semantically unnecessary, makes the graph lighter and thus more performant.


predicate isAdditionalFlowStep(
DataFlow::Node nodeFrom, FlowState stateFrom, DataFlow::Node nodeTo, FlowState stateTo
) {
exists(C::StringToDictConversion c |
nodeFrom = c.getAnInput() and
nodeTo = c.getOutput()
) and
stateFrom instanceof C::StringInput and
stateTo instanceof C::DictInput
}

predicate isBarrier(DataFlow::Node node) {
node = any(NoSqlSanitizer noSqlSanitizer).getAnInput()
}
}

module Flow = TaintTracking::GlobalWithState<NoSQLInjectionConfig>;
Comment thread
yoff marked this conversation as resolved.
Outdated
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@
*/

import python
import experimental.semmle.python.security.injection.NoSQLInjection
import DataFlow::PathGraph
import semmle.python.security.dataflow.NoSQLInjectionQuery
import Flow::PathGraph

from NoSqlInjection::Configuration config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink, source, sink, "This NoSQL query contains an unsanitized $@.", source,
from Flow::PathNode source, Flow::PathNode sink
where Flow::flowPath(source, sink)
select sink.getNode(), source, sink, "This NoSQL query contains an unsanitized $@.", source,
"user-provided value"
50 changes: 0 additions & 50 deletions python/ql/src/experimental/semmle/python/Concepts.qll
Original file line number Diff line number Diff line change
Expand Up @@ -216,56 +216,6 @@ class SqlEscape extends DataFlow::Node instanceof SqlEscape::Range {
DataFlow::Node getAnInput() { result = super.getAnInput() }
}

/** Provides a class for modeling NoSql execution APIs. */
module NoSqlQuery {
/**
* A data-flow node that executes NoSQL queries.
*
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `NoSQLQuery` instead.
*/
abstract class Range extends DataFlow::Node {
/** Gets the argument that specifies the NoSql query to be executed. */
abstract DataFlow::Node getQuery();
}
}

/**
* A data-flow node that executes NoSQL queries.
*
* Extend this class to refine existing API models. If you want to model new APIs,
* extend `NoSQLQuery::Range` instead.
*/
class NoSqlQuery extends DataFlow::Node instanceof NoSqlQuery::Range {
/** Gets the argument that specifies the NoSql query to be executed. */
DataFlow::Node getQuery() { result = super.getQuery() }
}

/** Provides classes for modeling NoSql sanitization-related APIs. */
module NoSqlSanitizer {
/**
* A data-flow node that collects functions sanitizing NoSQL queries.
*
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `NoSQLSanitizer` instead.
*/
abstract class Range extends DataFlow::Node {
/** Gets the argument that specifies the NoSql query to be sanitized. */
abstract DataFlow::Node getAnInput();
}
}

/**
* A data-flow node that collects functions sanitizing NoSQL queries.
*
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `NoSQLSanitizer::Range` instead.
*/
class NoSqlSanitizer extends DataFlow::Node instanceof NoSqlSanitizer::Range {
/** Gets the argument that specifies the NoSql query to be sanitized. */
DataFlow::Node getAnInput() { result = super.getAnInput() }
}

/** Provides classes for modeling HTTP Header APIs. */
module HeaderDeclaration {
/**
Expand Down
Loading