-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Java: Add QL support for automodel application mode #13239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Drive by comment but I think |
If I understand correctly, then this is something @kaeluka and I discussed, and were planning on doing (but didn't get around to implementing yet). 👍 Thanks for the reminder! 🙂 |
Importing `AutomodelEndpointTypes` inside `AutomodelSharedUtil` non-privately made it overlap with the imports in the candidate extraction queries.
Adds a utility predicate for turning integer indices into the desired string representation.
0e6a404 to
9a04124
Compare
3641a76 to
a89378d
Compare
java/ql/src/Telemetry/AutomodelApplicationModeExtractNegativeExamples.ql
Fixed
Show fixed
Hide fixed
|
@adityasharad are you able to give this a review? |
The data that @jhelie sent us in this PR is the next thing we'll look into. Then, we still need to resolve @adityasharad's comment above. Then this should be mergeable. |
|
I started looking into the additional sinks in this PR, compared to the ones in the old branch; I found the majority of additional sinks is models where input = The rest of the additional sinks I've looked at all seem good and valid candidates, it's difficult to say why the old branch would've omitted them. I'll send @jhelie a gist of my findings, but won't post it here in public. |
|
@tausbn and I have now also looked at the sinks we were missing. Out of the 65 missing sinks:
With that, the investigation of the diffs that Jean has sent us is done in our opinion. Additionally, I'll send Jean a complete gist of the differences as we have investigated — just like with the additional sinks last time. |
|
Oh apologies, I had missed your earlier message! #13239 (comment) |
|
One last question on the topic then: there are not many |
Oh!! Those fell under the table. Apologies. Give me until early tomorrow — shouldn't take longer than that for a look 👀 |
|
I've looked over these now and it appears that there aren't any duplicates in there. Some of the 'duplicates' are different sink candidates in similar locations, while others are similar candidates in different locations. If you disagree, please slack us a counterexample 👍 |
|
I think this is now ready to merge, can we get an approve, @jhelie? |
Thanks for having a look. Just to make sure I completely understand here are 2 examples: "296e18418bb32f4c:1_0_1": {
"ruleId": "java/ml/extract-automodel-application-candidates",
"rule": {
"id": "java/ml/extract-automodel-application-candidates",
"index": 0,
"toolComponent": {
"index": 0
}
},
"message": {
"text": "command-injection, sql, ssrf, tainted-path\nrelated locations: [CallContext](1).\nmetadata: [package](2), [type](3), [subtypes](4), [name](5), [signature](6), [input](7)."
},
"locations": [
{
"physicalLocation": {
"artifactLocation": {
"uri": "app/src/main/java/com/vuldroid/application/SplashScreen.java",
"uriBaseId": "%SRCROOT%",
"index": 89
},
"region": {
"startLine": 58,
"startColumn": 9,
"endColumn": 91
},
"contextRegion": {
"startLine": 56,
"endLine": 60,
"snippet": {
"text": "\n public void chekPermission(){\n Dexter.withContext(this).withPermission(Manifest.permission.READ_EXTERNAL_STORAGE).withListener(new PermissionListener() {\n @Override\n public void onPermissionGranted(PermissionGrantedResponse permissionGrantedResponse) {\n"
}
}
}
}
],
"partialFingerprints": {
"primaryLocationLineHash": "296e18418bb32f4c:1",
"primaryLocationStartColumnFingerprint": "0"
},
and "296e18418bb32f4c:1_0_2": {
"ruleId": "java/ml/extract-automodel-application-candidates",
"rule": {
"id": "java/ml/extract-automodel-application-candidates",
"index": 0,
"toolComponent": {
"index": 0
}
},
"message": {
"text": "command-injection, sql, ssrf, tainted-path\nrelated locations: [CallContext](1).\nmetadata: [package](2), [type](3), [subtypes](4), [name](5), [signature](6), [input](7)."
},
"locations": [
{
"physicalLocation": {
"artifactLocation": {
"uri": "app/src/main/java/com/vuldroid/application/SplashScreen.java",
"uriBaseId": "%SRCROOT%",
"index": 89
},
"region": {
"startLine": 58,
"startColumn": 9,
"endColumn": 33
},
"contextRegion": {
"startLine": 56,
"endLine": 60,
"snippet": {
"text": "\n public void chekPermission(){\n Dexter.withContext(this).withPermission(Manifest.permission.READ_EXTERNAL_STORAGE).withListener(new PermissionListener() {\n @Override\n public void onPermissionGranted(PermissionGrantedResponse permissionGrantedResponse) {\n"
}
}
}
}
],
"partialFingerprints": {
"primaryLocationLineHash": "296e18418bb32f4c:1",
"primaryLocationStartColumnFingerprint": "0"
},
Initially I thought it was the same line copied but the location (including the file) is the same - the The public void chekPermission(){
Dexter.withContext(this).withPermission(Manifest.permission.READ_EXTERNAL_STORAGE).withListener(new PermissionListener() {
@Override
public void onPermissionGranted(PermissionGrantedResponse permissionGrantedResponse) {could you just clarify for me what are the 2 candidates that the edit: based on the |
I haven't followed the latest rounds of discussion with @adityasharad so I'll let him do the ✅ |
If you look at the snippets, you'll see that the expressions literally start at the same location |
I don't understand sorry: they are on the same line but not the same column? |
|
This is a weird example because of the chained calls ( The first call is: Dexter.withContext(this).withPermission(Manifest.permission.READ_EXTERNAL_STORAGE)The second call is: Dexter.withContext(this).withPermission(Manifest.permission.READ_EXTERNAL_STORAGE).withListener(new PermissionListener() {...})The way the locations are constructed for these call expressions, both call expressions start on the same line and column, but end on different columns. The start column of each call to |
| * A class representing nodes that are arguments to calls. | ||
| */ | ||
| private class ArgumentNode extends DataFlow::Node { | ||
| ArgumentNode() { this.asExpr() = [any(Call c).getAnArgument(), any(Call c).getQualifier()] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about this. This isn't taking into account variadic arguments, which means that, for the following:
public void test(String a, String... b) {}
public void test2() { test("a", "b", "c"); }The arguments of test will have the following positions and generate the respective MaD "input" strings, if I understand the current CodeQL implementation correctly:
| Expr | idx | input |
|---|---|---|
| test(...) | -1 | Argument[this] |
| "a" | 0 | Argument[0] |
| "b" | 1 | Argument[1] |
| "c" | 2 | Argument[2] |
However, the signature of test is test(String,String[]), which means that Argument[2] won't match anything. In this case, both "b" and "c" should generate Argument[1] since they are part of the same variadic argument.
Note that the kind of DataFlow::Node that handles this isn't an ExprNode but rather an ImplicitVarargsArray. That's why using Node::asExpr won't work well in those cases.
Happy to handle this in a different PR if you want this merged ASAP, as long as you're aware of this shortcoming in the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @atorralba ! I've added your comment to this list so that we don't forget about it but my understanding is that it shouldn't prevent us from merging this for now.
atorralba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo the caveat explained above.
|
|
This adds application mode extraction queries, using the legacy implementation as a guideline. We expect the behaviour to be mostly equal, but with some differences
UninterestingToModelCharacteristicimplementation that excludes some well-known frameworks. These would otherwise make up a large fraction of the candidates and we suggest removing them, as we now have framework mode.When reviewing this, feel free to run the extraction on a DB locally and tell us if you spot anything that should/shouldn't be a candidate, or +/- example in your opinion.
We've split this up in a large number of commits that are mostly rather small, but a review would be easiest to do on the whole code. YMMV.