JS: add sources and sinks for typeahead.js by erik-krogh · Pull Request #2429 · github/codeql

erik-krogh · 2019-11-25T13:18:58Z

This PR is inspired a CVE that is not flagged as a result of this PR (missing source).

There are multiple parts in this PR, the most important being the XSS sink, demonstrated by the below code:

$('.typeahead').typeahead({},
  {
    name: 'dashboards',
    source: function (query, cb) {
      var target = document.location.search // <- source
      cb(target); // <- target flows to the `val` param further down. 
    },
    templates: {
      // strings returned the `suggestion` function are inserted directly as HTML into the DOM. 
      suggestion: function(val) { 
        return val; // <- sink.  
      }
    }
  }
)

The source can be more complicated, especially if a library is used to create it.
The below is an example of that, where the Bloodhound class (part of the typeahead.js library) is used to fetch results from a remote URL:

var autocompleter = new Bloodhound({
  prefetch: remoteUrl
})
autocompleter.initialize();
$('.typeahead').typeahead({}, {
  source: autocompleter.ttAdapter(),
  templates: {
    suggestion: function(val) { // val is a JSON object from `remoteUrl`. 
      return val; // sink
    }
  }
})

I've modelled this source using the ClientRequest class, but I've cheated.
The getAResponseDataNode method is supposed to return the first Node where the response occurs. In the above example that would be the val parameter.
However, presenting a developer with a taint-flow that is entirely inside the suggestion function, when the actual source is within the Bloodhound class, is hardly useful.
I've therefore made the getAResponseDataNode method return the Bloodhound instance itself, even though that is technically incorrect, because it gives a more useful taint-path.
The getAResponseDataNode also contain some skeleton of the code that would be required to do it the "right way" (it still needs added type-tracking).

I've found some TP's using this source/sink pair: https://lgtm.com/query/7028618148576176487/
(They require that semmle.javascript.heuristics.AdditionalSources is imported).
None of them are really dangerous.

esbena

This is very nice, thank you for modelling how the source part of typeahead works.
My main concern is about the semantic consequences of the well-meaning improvements to the path-explanations.

Other things to consider:

performance evaluation since we have added default taint steps
change notes
is the model general enough to support other suggestion providers than bloodhound? (I think it is, but please double check, perhaps make a note of other common suggestion providers that may be worth modelling later).

esbena · 2019-11-26T08:15:03Z

+module Typeahead {
+  /**
+   * A reference to the Bloodhound class, which is a utility-class for generating auto-complete suggestions.
+   * Sometimes these suggestions can originate from remote sources.


Sometimes these suggestions can originate from remote sources.

This part of the docstring can be dropped. We care about remote sources elsewhere.

esbena · 2019-11-26T08:16:48Z

+   */
+  class Bloodhound extends DataFlow::SourceNode {
+    Bloodhound() {
+      this = DataFlow::moduleImport("typeahead.js/dist/bloodhound.js")


😱
I think the more correct way to import bloodhound is as require('bloodhound-js') this package: https://www.npmjs.com/package/bloodhound-js. Can we support that as well?

esbena · 2019-11-26T08:26:40Z

+     * or an object containing an "url" property.
+     */
+    override DataFlow::Node getUrl() {
+      if exists(option.getALocalSource().getAPropertyWrite("url"))


It is usually fine to just formulate this as result = option.getALocalSource().getAPropertyWrite("url").getRhs() or result = option. I can't recall any examples right now though. The more I think about this, the more it feels like a deja vu discussion...

Both formulations have drawbacks, the current formulation would fail to find urlB in the following case:

new Bloodhound({remote: advanced? {url: urlA, ...}: urlB})

My proposed formulation would also find {url: urlA, ...} to be the URL,

If the getUrl() is used together with mayHaveStringValue(..), then the spurious value from your proposed formulation is not an issue..

I'll change it to your suggestion.

esbena · 2019-11-26T08:47:57Z

+      // the first occurrence of the responseDataNode can be very disconnected from the instantiation of Bloodhound
+      // So I do this trick to get a taint-path that is readable to a developer.
+      // The above (possibly with added type-tracking) would be the correct way, but which gives unhelpful feedback to developers.
+      result = this


This is starting to become a common concern, I also mentioned this lack of fully explained path in in #2324. The problem is a UI issue though, and may be resolved by having multiple path explanations for each result. I think we should steer clear of this in QL for now. We can loop @sj in the next time we encounter this.

As such, I would prefer the other solution (using local dataflow for starters) and then letting definitions.ql aid the developer in figuring out why the flow is remote.

Besides, the current solution is semantically wrong: the response data node is not the instance, period. Treating ClientRequest as a remote flow source is just begging for unrelated problems when the request itself flows to some other sink. I agree that it could make sense to add any(RemoteBloodhoundClientRequest r) to Configuration::isSink(s) in a custom query, but in general we should stick to the semantically correct formulation.

esbena · 2019-11-26T08:52:53Z

+
+    TypeaheadSuggestionFunction() {
+      typeaheadCall = JQuery::objectRef().getAMethodCall("typeahead") and
+      this = typeaheadCall


this matches $(...).typeahead(..., { templates: { suggestion: <this> } }), right? Can we add that as a comment?

esbena · 2019-11-26T09:10:04Z

+      (
+        pred = this
+        or
+        pred = this.getAFunctionValue().getParameter(1).getACall().getAnArgument()


Can we add a source code example for this disjunct?
It would also be nice if another comment could explain why it makes sense that pred is defined by two disjuncts that are so different from each other. It smells like we are trying very hard to provide useful paths for the programmer, which we unfortunately always can if it means that we are cutting semantic corners (c.f. discussion for the RemoteBloodhoundClientRequest class).

After changing to the semantically correct solution (discussion above), this disjunct will no longer be necessary.

esbena · 2019-11-26T09:11:36Z

+   * A taint step that models a call to `.ttAdapter()` on an instance of Bloodhound.
+   */
+  class BloodHoundAdapterStep extends TaintTracking::AdditionalTaintStep, BloodhoundInstance {
+    DataFlow::Node successor;


This field is not bound.

esbena · 2019-11-26T09:12:43Z

+  /**
+   * A taint step that models a call to `.ttAdapter()` on an instance of Bloodhound.
+   */
+  class BloodHoundAdapterStep extends TaintTracking::AdditionalTaintStep, BloodhoundInstance {


This entire class should be removed if we use typetracking in RemoteBloodhoundClientRequest::getAResponseDataNode, right.

esbena · 2019-11-26T09:18:55Z

+
+
+from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
+where cfg.hasFlowPath(source, sink)


Can we add:

and source.getNode() instanceof RemoteServerResponse

or perhaps:

and source.getNode() instanceof HeuristicSource

to avoid reporting the ordinary XSS results twice in the tests?

esbena · 2019-11-26T09:42:30Z

 import semmle.javascript.frameworks.Electron
 import semmle.javascript.frameworks.Files
 import semmle.javascript.frameworks.Firebase
+import semmle.javascript.frameworks.typeahead


I think we want this spelled as Typeahead, jQuery is a special case. Also, this list of imports happens to be sorted.

esbena

Getting close.

One general thing:
Can you have a pass over all the predicates and consider which ones we really want to expose to the public?
I think I would like to play it safe in the name of compatibility maintenance with this library and make everything but TypeaheadSuggestionFunction private.

esbena · 2019-11-26T20:45:29Z

+      exists(DataFlow::TypeTracker t2 | result = ref(t2).track(t2, t))
+    }
+
+    DataFlow::SourceNode ref() { result = ref(DataFlow::TypeTracker::end()) }


See comment for the above predicate.

Co-Authored-By: Esben Sparre Andreasen <esbena@github.com>

erik-krogh · 2019-11-27T10:25:55Z

is the model general enough to support other suggestion providers than bloodhound? (I think it is, but please double check, perhaps make a note of other common suggestion providers that may be worth modelling later).

The model does not support other third party suggestion providers. But it does support suggestion providers that are part of the client itself.
I looked through the source in all of my benchmarks, and if another remote source is used then a wrapper function is usually needed in the client.
And our models play nicely together when such a wrapper function is used.

esbena

One final refactoring suggestion (changes semantics slightly)

esbena · 2019-11-27T10:47:08Z

+    DataFlow::CallNode getTypeaheadCall() { result = typeaheadCall }
+  }
+
+  /**


I am sorry for the repeated refactoring of this class. I think we can untangle the concepts a bit more:

/** * A `source` option for a typeahead.js plugin instance. */ private class TypeaheadSource extends DataFlow::ValueNode { DataFlow::CallNode typeaheadCall; TypeaheadSource() { typeaheadCall = JQuery::objectRef().getAMethodCall("typeahead") and this = typeaheadCall.getOptionArgument(1, "source") } /** Gets a node for a suggestion that this source motivates. */ DataFlow::Node getASuggestion() { exists(TypeaheadSuggestionFunction suggestionCallback | suggestionCallback.getTypeaheadCall() = typeaheadCall and result = suggestionCallback.getParameter(0) ) } }

Now the source exists independently of a suggestion function, which seems semantically cleaner.

The genericly named getSuccessor has a more descriptive name

(If we were to redesign this typeahead model. I think I would introduce a class namedTypeaheadInstance, defined as JQuery::objectRef().getAMethodCall("typeahead"), this class would then expose the connection between the TypeaheadSource and TypeaheadSuggestionFunction, but the current visibility choice does not prevent us from doing that later.)

(If we were to redesign this typeahead model. I think I would introduce a class namedTypeaheadInstance, defined as JQuery::objectRef().getAMethodCall("typeahead")

With your change JQuery::objectRef().getAMethodCall("typeahead") is already twice in the model, so I'll go ahead and refactor it.

erik-krogh · 2019-11-27T12:06:20Z

Performance is not that bad, but its also not perfect.

esbena · 2019-11-27T12:51:48Z

Approved. But lets do a performance evaluation that includes a few more (taint) queries (or just run the security suite) before merging.

erik-krogh · 2019-11-27T12:55:03Z

Approved. But lets do a performance evaluation that includes a few more (taint) queries (or just run the security suite) before merging.

On it.
I miss the romans.

erik-krogh · 2019-11-28T13:10:52Z

Approved. But lets do a performance evaluation that includes a few more (taint) queries (or just run the security suite) before merging.

https://git.semmle.com/erik/dist-compare-reports/tree/profiling-erik-krogh.northeurope.cloudapp.azure.com_1574920882307

Looks ok to me.
Can you do a re-approve?

esbena · 2019-11-29T18:58:03Z

Lets rerun the two ~~slowest projects~~ projects with the largest relative slowdowns (both shas), the 10% overheads are hopefully not reproducible.

erik-krogh · 2019-12-02T07:38:15Z

Lets rerun the two ~~slowest projects~~ projects with the largest relative slowdowns (both shas), the 10% overheads are hopefully not reproducible.

I did the 5 slowest projects instead of just 2.
The 10% overheads were not reproducible.
https://git.semmle.com/erik/dist-compare-reports/tree/profiling-erik-krogh.northeurope.cloudapp.azure.com_1575222915368

erik-krogh · 2019-12-02T13:31:59Z

And now I also got the tests to pass

add sources and sinks for typeahead.js

c7235bb

erik-krogh added JS WIP This is a work-in-progress, do not merge yet! labels Nov 25, 2019

erik-krogh requested a review from a team as a code owner November 25, 2019 13:18

esbena requested changes Nov 26, 2019

View reviewed changes

erik-krogh added 7 commits November 26, 2019 12:52

add change note

b06acd1

the callback function can both be the second and third argument

97718bf

changes based on review feedback

4a94c49

change the typeahead.js model to be semantically correct

ace484a

Merge remote-tracking branch 'upstream/master' into typeAheadSink

5a0cabb

update expected output

7b262fa

simplify multiple parameter selection

9b608e9

esbena requested changes Nov 26, 2019

View reviewed changes

erik-krogh and others added 3 commits November 27, 2019 10:52

remove superfluous line break

6d63d75

Co-Authored-By: Esben Sparre Andreasen <esbena@github.com>

changes based on review feedback

60f7a7a

update expected test output

42fbcbf

esbena requested changes Nov 27, 2019

View reviewed changes

update test to not use private classes

4f75986

refactor classes in typeahead.js model

bafd57d

esbena previously approved these changes Nov 27, 2019

View reviewed changes

erik-krogh added 2 commits November 27, 2019 15:19

Merge remote-tracking branch 'upstream/master' into typeAheadSink

34e44e8

update expected output

d212394

erik-krogh dismissed esbena’s stale review via d212394 November 27, 2019 14:22

erik-krogh added 2 commits December 2, 2019 08:41

Merge remote-tracking branch 'upstream/master' into typeAheadSink

c6c1ebe

update expected test outpu

ea9d618

esbena approved these changes Dec 3, 2019

View reviewed changes

esbena removed the WIP This is a work-in-progress, do not merge yet! label Dec 3, 2019

semmle-qlci merged commit cfcd18b into github:master Dec 3, 2019



		from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
		where cfg.hasFlowPath(source, sink)

Conversation

erik-krogh commented Nov 25, 2019 • edited by esbena Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

esbena left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

esbena left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erik-krogh commented Nov 27, 2019

Uh oh!

esbena left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erik-krogh commented Nov 27, 2019

Uh oh!

esbena commented Nov 27, 2019

Uh oh!

erik-krogh commented Nov 27, 2019

Uh oh!

erik-krogh commented Nov 28, 2019

Uh oh!

esbena commented Nov 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erik-krogh commented Dec 2, 2019

Uh oh!

erik-krogh commented Dec 2, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erik-krogh commented Nov 25, 2019 •

edited by esbena

Loading

esbena commented Nov 29, 2019 •

edited

Loading