Java: Add QL support for automodel application mode #13239

tausbn · 2023-05-22T13:56:38Z

This adds application mode extraction queries, using the legacy implementation as a guideline. We expect the behaviour to be mostly equal, but with some differences

this includes an UninterestingToModelCharacteristic implementation that excludes some well-known frameworks. These would otherwise make up a large fraction of the candidates and we suggest removing them, as we now have framework mode.
this limits the number of negative examples to at most 100 per class, as the negative examples could otherwise easily outweigh the candidates by a factor of 15-20x — this seemed overkill, given that only a handful ever make it into our prompts.

When reviewing this, feel free to run the extraction on a DB locally and tell us if you spot anything that should/shouldn't be a candidate, or +/- example in your opinion.

We've split this up in a large number of commits that are mostly rather small, but a review would be easiest to do on the whole code. YMMV.

jhelie · 2023-05-24T15:25:34Z

Drive by comment but I think CallContext should lift the wider [context][region] rather than the call itself?

tausbn · 2023-05-24T15:38:48Z

Drive by comment but I think CallContext should lift the wider [context][region] rather than the call itself?

If I understand correctly, then this is something @kaeluka and I discussed, and were planning on doing (but didn't get around to implementing yet). 👍

Thanks for the reminder! 🙂

java/ql/src/Telemetry/AutomodelApplicationModeCharacteristics.qll

…e names

Importing `AutomodelEndpointTypes` inside `AutomodelSharedUtil` non-privately made it overlap with the imports in the candidate extraction queries.

Adds a utility predicate for turning integer indices into the desired string representation.

…tion

…e sarif file sizes

java/ql/src/Telemetry/AutomodelApplicationModeExtractNegativeExamples.ql

kaeluka · 2023-05-30T08:06:08Z

@adityasharad are you able to give this a review?

java/ql/src/Telemetry/AutomodelApplicationModeCharacteristics.qll

kaeluka · 2023-06-07T14:52:13Z

Following up on my above message, and based on the work in the sister PR github/codeml-automodel#117, here is more data on the changes in candidates for the application mode (assessed using the same android source suite):

candidates_duplicates.sarif.txt candidates_additional.sarif.txt candidates_missing.sarif.txt

@kaeluka @tausbn as discussed I think it's worth having a look at these (especially the additional candidates) before merging this PR - to sanity check everything is as we'd expect it.

The data that @jhelie sent us in this PR is the next thing we'll look into. Then, we still need to resolve @adityasharad's comment above. Then this should be mergeable.

kaeluka · 2023-06-08T09:35:34Z

I started looking into the additional sinks in this PR, compared to the ones in the old branch; I found the majority of additional sinks is models where input = Argument[this]. The old branch skipped those, I think. This branch is right, IMO, to not skip them.

The rest of the additional sinks I've looked at all seem good and valid candidates, it's difficult to say why the old branch would've omitted them.

I'll send @jhelie a gist of my findings, but won't post it here in public.

kaeluka · 2023-06-09T13:01:57Z

@tausbn and I have now also looked at the sinks we were missing.

Out of the 65 missing sinks:

62 of them were dropped by the characteristic we have already removed from this PR in favour of revisiting later.
3 of them were dropped due to a bug (nice catch, @jhelie 🥳) that we have fixed in the latest commit (b38bc52). The bug fix, for that DB, does only additionally produce those exact three candidates, and doesn't change the results otherwise.

With that, the investigation of the diffs that Jean has sent us is done in our opinion.

Additionally, I'll send Jean a complete gist of the differences as we have investigated — just like with the additional sinks last time.

jhelie · 2023-06-09T13:05:21Z

Thanks @kaeluka, good news 👍 And what about the additional ones? Are they accounted for by #13372 ?

kaeluka · 2023-06-12T08:08:32Z

Thanks @kaeluka, good news 👍 And what about the additional ones? Are they accounted for by #13372 ?

There may be some, but we didn't find any in the sample we spot-checked.

jhelie · 2023-06-12T08:49:46Z

Oh apologies, I had missed your earlier message! #13239 (comment)

jhelie · 2023-06-12T08:52:19Z

One last question on the topic then: there are not many duplicate candidates (193) but could they be indicative of some small issue with the way we identify/characterise the location of candidates?

kaeluka · 2023-06-12T16:07:06Z

One last question on the topic then: there are not many duplicate candidates (193) but could they be indicative of some small issue with the way we identify/characterise the location of candidates?

Oh!! Those fell under the table. Apologies. Give me until early tomorrow — shouldn't take longer than that for a look 👀

kaeluka · 2023-06-13T09:24:08Z

I've looked over these now and it appears that there aren't any duplicates in there. Some of the 'duplicates' are different sink candidates in similar locations, while others are similar candidates in different locations. If you disagree, please slack us a counterexample 👍

kaeluka · 2023-06-13T11:26:44Z

I think this is now ready to merge, can we get an approve, @jhelie?

jhelie · 2023-06-13T12:09:11Z

I've looked over these now and it appears that there aren't any duplicates in there. Some of the 'duplicates' are different sink candidates in similar locations, while others are similar candidates in different locations. If you disagree, please slack us a counterexample 👍

Thanks for having a look. Just to make sure I completely understand here are 2 examples:

"296e18418bb32f4c:1_0_1": {
        "ruleId": "java/ml/extract-automodel-application-candidates",
            "rule": {
            "id": "java/ml/extract-automodel-application-candidates",
            "index": 0,
            "toolComponent": {
                "index": 0
            }
        },
        "message": {
            "text": "command-injection, sql, ssrf, tainted-path\nrelated locations: [CallContext](1).\nmetadata: [package](2), [type](3), [subtypes](4), [name](5), [signature](6), [input](7)."
        },
        "locations": [
            {
                "physicalLocation": {
                    "artifactLocation": {
                        "uri": "app/src/main/java/com/vuldroid/application/SplashScreen.java",
                        "uriBaseId": "%SRCROOT%",
                        "index": 89
                    },
                    "region": {
                        "startLine": 58,
                        "startColumn": 9,
                        "endColumn": 91
                    },
                    "contextRegion": {
                        "startLine": 56,
                        "endLine": 60,
                        "snippet": {
                            "text": "\n    public void chekPermission(){\n        Dexter.withContext(this).withPermission(Manifest.permission.READ_EXTERNAL_STORAGE).withListener(new PermissionListener() {\n            @Override\n            public void onPermissionGranted(PermissionGrantedResponse permissionGrantedResponse) {\n"
                        }
                    }
                }
            }
        ],
        "partialFingerprints": {
            "primaryLocationLineHash": "296e18418bb32f4c:1",
            "primaryLocationStartColumnFingerprint": "0"
        },

and

 "296e18418bb32f4c:1_0_2": {
        "ruleId": "java/ml/extract-automodel-application-candidates",
            "rule": {
            "id": "java/ml/extract-automodel-application-candidates",
            "index": 0,
            "toolComponent": {
                "index": 0
            }
        },
        "message": {
            "text": "command-injection, sql, ssrf, tainted-path\nrelated locations: [CallContext](1).\nmetadata: [package](2), [type](3), [subtypes](4), [name](5), [signature](6), [input](7)."
        },
        "locations": [
            {
                "physicalLocation": {
                    "artifactLocation": {
                        "uri": "app/src/main/java/com/vuldroid/application/SplashScreen.java",
                        "uriBaseId": "%SRCROOT%",
                        "index": 89
                    },
                    "region": {
                        "startLine": 58,
                        "startColumn": 9,
                        "endColumn": 33
                    },
                    "contextRegion": {
                        "startLine": 56,
                        "endLine": 60,
                        "snippet": {
                            "text": "\n    public void chekPermission(){\n        Dexter.withContext(this).withPermission(Manifest.permission.READ_EXTERNAL_STORAGE).withListener(new PermissionListener() {\n            @Override\n            public void onPermissionGranted(PermissionGrantedResponse permissionGrantedResponse) {\n"
                        }
                    }
                }
            }
        ],
        "partialFingerprints": {
            "primaryLocationLineHash": "296e18418bb32f4c:1",
            "primaryLocationStartColumnFingerprint": "0"
        },

Initially I thought it was the same line copied but the location (including the file) is the same - the endColumn is the only thing changing.

The contextRegion for both starts with:

  public void chekPermission(){
        Dexter.withContext(this).withPermission(Manifest.permission.READ_EXTERNAL_STORAGE).withListener(new PermissionListener() {
            @Override
            public void onPermissionGranted(PermissionGrantedResponse permissionGrantedResponse) {

could you just clarify for me what are the 2 candidates that the region entries refer to?

edit: based on the name entry further in the sarif they are withPermission and withListener but I don't get why the "startColumn": 9 of withListener is the same as that of withPermission ?

jhelie · 2023-06-13T12:21:42Z

I think this is now ready to merge, can we get an approve, @jhelie?

I haven't followed the latest rounds of discussion with @adityasharad so I'll let him do the ✅

kaeluka · 2023-06-13T14:06:56Z

edit: based on the name entry further in the sarif they are ... and ... but I don't get why the "startColumn": 9 of withListener is the same as that of withPermission ?

If you look at the snippets, you'll see that the expressions literally start at the same location

jhelie · 2023-06-13T14:08:17Z

If you look at the snippets, you'll see that the expressions literally start at the same location

I don't understand sorry: they are on the same line but not the same column?

adityasharad · 2023-06-13T15:31:35Z

This is a weird example because of the chained calls (x.y(...).z(...)) and how such locations are described.

The first call is:

Dexter.withContext(this).withPermission(Manifest.permission.READ_EXTERNAL_STORAGE)

The second call is:

Dexter.withContext(this).withPermission(Manifest.permission.READ_EXTERNAL_STORAGE).withListener(new PermissionListener() {...})

The way the locations are constructed for these call expressions, both call expressions start on the same line and column, but end on different columns. The start column of each call to withX is (sensibly but somewhat counterintuitively) not the column just before the w, but the column just before the Dexter. because the qualifiers are included.

atorralba · 2023-06-14T07:04:06Z

java/ql/src/Telemetry/AutomodelApplicationModeCharacteristics.qll

+ * A class representing nodes that are arguments to calls.
+ */
+private class ArgumentNode extends DataFlow::Node {
+  ArgumentNode() { this.asExpr() = [any(Call c).getAnArgument(), any(Call c).getQualifier()] }


I've been thinking about this. This isn't taking into account variadic arguments, which means that, for the following:

public void test(String a, String... b) {} public void test2() { test("a", "b", "c"); }

The arguments of test will have the following positions and generate the respective MaD "input" strings, if I understand the current CodeQL implementation correctly:

Expr idx input

test(...) -1 Argument[this]

"a" 0 Argument[0]

"b" 1 Argument[1]

"c" 2 Argument[2]

However, the signature of test is test(String,String[]), which means that Argument[2] won't match anything. In this case, both "b" and "c" should generate Argument[1] since they are part of the same variadic argument.

Note that the kind of DataFlow::Node that handles this isn't an ExprNode but rather an ImplicitVarargsArray. That's why using Node::asExpr won't work well in those cases.

Happy to handle this in a different PR if you want this merged ASAP, as long as you're aware of this shortcoming in the current implementation.

Thanks @atorralba ! I've added your comment to this list so that we don't forget about it but my understanding is that it shouldn't prevent us from merging this for now.

atorralba

LGTM modulo the caveat explained above.

jhelie · 2023-06-14T09:42:20Z

🚀

github-actions bot added the Java label May 22, 2023

github-advanced-security bot found potential problems May 25, 2023

View reviewed changes

java/ql/src/Telemetry/AutomodelApplicationModeCharacteristics.qll Fixed Show fixed Hide fixed

tausbn and others added 13 commits May 25, 2023 14:15

Java: Add QL support for automodel application mode

6fc1657

Java: Add negative characteristic for static calls

9b30f9a

Java: add application-mode and framework-mode tags to extraction queries

185ad10

Java: make input an actual string, not an integer

7c3bc26

Java: update extraction query metadata

6e21f14

Java: remove unneeded abstract metadata extractor classes and fix som…

d93ad9b

…e names

Java: share isKnownKind between modes

db61a2d

Java: Avoid overlapping import

04b8bf3

Importing `AutomodelEndpointTypes` inside `AutomodelSharedUtil` non-privately made it overlap with the imports in the candidate extraction queries.

Java: Share argument indexing logic

11ab7e2

Adds a utility predicate for turning integer indices into the desired string representation.

Java: Port over characteristics from codex branch

2000f22

Java: remove superfluous characteristic

33fdb0f

Java: use containing call as call context, not argument

f224a40

Java: fine-tune characteristics

9a04124

kaeluka force-pushed the tausbn/automodel-application-mode branch from 0e6a404 to 9a04124 Compare May 25, 2023 12:16

Stephan Brandauer added 4 commits May 25, 2023 16:28

improve CannotBeTaintedCharacteristic

76d731a

Java: mark functional expressions as likely not sinks

db77c6b

remove some of the biggest frameworks from application mode considera…

5ca2221

…tion

Java: add extra known frameworks and sample negative samples to manag…

a89378d

…e sarif file sizes

kaeluka force-pushed the tausbn/automodel-application-mode branch from 3641a76 to a89378d Compare May 26, 2023 11:20

Java: better sampling of negative examples

efe539e

github-advanced-security bot found potential problems May 26, 2023

View reviewed changes

java/ql/src/Telemetry/AutomodelApplicationModeExtractNegativeExamples.ql Fixed Show fixed Hide fixed

Java: Get location ordering without toString

227c5fa

add support for sanitizers

d4b964c

kaeluka marked this pull request as ready for review May 30, 2023 09:03

kaeluka requested a review from a team as a code owner May 30, 2023 09:03

Stephan Brandauer added 3 commits June 7, 2023 14:09

Java: update getRelatedLocation qldoc

92ad02a

Java: share getCallable interface between automodel extraction modes

a8799fe

Java: comment why we're using erased types in MaD

7e77e2e

github-advanced-security bot found potential problems Jun 7, 2023

View reviewed changes

java/ql/src/Telemetry/AutomodelApplicationModeCharacteristics.qll Fixed Show fixed Hide fixed

Stephan Brandauer added 3 commits June 7, 2023 14:55

Java: share considerSubtypes predicate between Java modes

715b135

Java: qldoc style

ec3a7e3

Java: fix import

2921df4

Java: fix bug in ExcludedFromModeling Characteristic

b38bc52

adityasharad approved these changes Jun 13, 2023

View reviewed changes

atorralba reviewed Jun 14, 2023

View reviewed changes

atorralba approved these changes Jun 14, 2023

View reviewed changes

jhelie mentioned this pull request Jun 14, 2023

Java: mark MaD step sources as uninteresting to model in framework mode #13372

Closed

jhelie merged commit 209f3e2 into main Jun 14, 2023

jhelie deleted the tausbn/automodel-application-mode branch June 14, 2023 09:42

Expr	idx	input
test(...)	-1	Argument[this]
"a"	0	Argument[0]
"b"	1	Argument[1]
"c"	2	Argument[2]

Java: Add QL support for automodel application mode #13239

Java: Add QL support for automodel application mode #13239

Uh oh!

Conversation

tausbn commented May 22, 2023 • edited by kaeluka Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhelie commented May 24, 2023

Uh oh!

tausbn commented May 24, 2023

Uh oh!

Uh oh!

Uh oh!

kaeluka commented May 30, 2023

Uh oh!

Uh oh!

kaeluka commented Jun 7, 2023

Uh oh!

kaeluka commented Jun 8, 2023

Uh oh!

kaeluka commented Jun 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhelie commented Jun 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaeluka commented Jun 12, 2023

Uh oh!

jhelie commented Jun 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhelie commented Jun 12, 2023

Uh oh!

kaeluka commented Jun 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaeluka commented Jun 13, 2023

Uh oh!

kaeluka commented Jun 13, 2023

Uh oh!

jhelie commented Jun 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhelie commented Jun 13, 2023

Uh oh!

kaeluka commented Jun 13, 2023

Uh oh!

jhelie commented Jun 13, 2023

Uh oh!

adityasharad commented Jun 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atorralba Jun 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhelie Jun 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atorralba left a comment

Choose a reason for hiding this comment

Uh oh!

jhelie commented Jun 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

tausbn commented May 22, 2023 •

edited by kaeluka

Loading

kaeluka commented Jun 9, 2023 •

edited

Loading

jhelie commented Jun 9, 2023 •

edited

Loading

jhelie commented Jun 12, 2023 •

edited

Loading

kaeluka commented Jun 12, 2023 •

edited

Loading

jhelie commented Jun 13, 2023 •

edited

Loading

adityasharad commented Jun 13, 2023 •

edited

Loading

atorralba Jun 14, 2023 •

edited

Loading

jhelie Jun 14, 2023 •

edited

Loading