Skip to content

JS: add sources and sinks for typeahead.js#2429

Merged
semmle-qlci merged 17 commits into
github:masterfrom
erik-krogh:typeAheadSink
Dec 3, 2019
Merged

JS: add sources and sinks for typeahead.js#2429
semmle-qlci merged 17 commits into
github:masterfrom
erik-krogh:typeAheadSink

Conversation

@erik-krogh
Copy link
Copy Markdown
Contributor

@erik-krogh erik-krogh commented Nov 25, 2019

The library: typeahead.js

This PR is inspired a CVE that is not flagged as a result of this PR (missing source).

There are multiple parts in this PR, the most important being the XSS sink, demonstrated by the below code:

$('.typeahead').typeahead({},
  {
    name: 'dashboards',
    source: function (query, cb) {
      var target = document.location.search // <- source
      cb(target); // <- target flows to the `val` param further down. 
    },
    templates: {
      // strings returned the `suggestion` function are inserted directly as HTML into the DOM. 
      suggestion: function(val) { 
        return val; // <- sink.  
      }
    }
  }
)	

The source can be more complicated, especially if a library is used to create it.
The below is an example of that, where the Bloodhound class (part of the typeahead.js library) is used to fetch results from a remote URL:

var autocompleter = new Bloodhound({
  prefetch: remoteUrl
})
autocompleter.initialize();
$('.typeahead').typeahead({}, {
  source: autocompleter.ttAdapter(),
  templates: {
    suggestion: function(val) { // val is a JSON object from `remoteUrl`. 
      return val; // sink
    }
  }
})

I've modelled this source using the ClientRequest class, but I've cheated.
The getAResponseDataNode method is supposed to return the first Node where the response occurs. In the above example that would be the val parameter.
However, presenting a developer with a taint-flow that is entirely inside the suggestion function, when the actual source is within the Bloodhound class, is hardly useful.
I've therefore made the getAResponseDataNode method return the Bloodhound instance itself, even though that is technically incorrect, because it gives a more useful taint-path.
The getAResponseDataNode also contain some skeleton of the code that would be required to do it the "right way" (it still needs added type-tracking).

I've found some TP's using this source/sink pair: https://lgtm.com/query/7028618148576176487/
(They require that semmle.javascript.heuristics.AdditionalSources is imported).
None of them are really dangerous.

@erik-krogh erik-krogh added JS WIP This is a work-in-progress, do not merge yet! labels Nov 25, 2019
@erik-krogh erik-krogh requested a review from a team as a code owner November 25, 2019 13:18
Copy link
Copy Markdown
Contributor

@esbena esbena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice, thank you for modelling how the source part of typeahead works.
My main concern is about the semantic consequences of the well-meaning improvements to the path-explanations.

Other things to consider:

  • performance evaluation since we have added default taint steps
  • change notes
  • is the model general enough to support other suggestion providers than bloodhound? (I think it is, but please double check, perhaps make a note of other common suggestion providers that may be worth modelling later).

module Typeahead {
/**
* A reference to the Bloodhound class, which is a utility-class for generating auto-complete suggestions.
* Sometimes these suggestions can originate from remote sources.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes these suggestions can originate from remote sources.

This part of the docstring can be dropped. We care about remote sources elsewhere.

*/
class Bloodhound extends DataFlow::SourceNode {
Bloodhound() {
this = DataFlow::moduleImport("typeahead.js/dist/bloodhound.js")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱
I think the more correct way to import bloodhound is as require('bloodhound-js') this package: https://www.npmjs.com/package/bloodhound-js. Can we support that as well?

* or an object containing an "url" property.
*/
override DataFlow::Node getUrl() {
if exists(option.getALocalSource().getAPropertyWrite("url"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is usually fine to just formulate this as result = option.getALocalSource().getAPropertyWrite("url").getRhs() or result = option. I can't recall any examples right now though. The more I think about this, the more it feels like a deja vu discussion...

Both formulations have drawbacks, the current formulation would fail to find urlB in the following case:

new Bloodhound({remote: advanced? {url: urlA, ...}: urlB})

My proposed formulation would also find {url: urlA, ...} to be the URL,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the getUrl() is used together with mayHaveStringValue(..), then the spurious value from your proposed formulation is not an issue..

I'll change it to your suggestion.

// the first occurrence of the responseDataNode can be very disconnected from the instantiation of Bloodhound
// So I do this trick to get a taint-path that is readable to a developer.
// The above (possibly with added type-tracking) would be the correct way, but which gives unhelpful feedback to developers.
result = this
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to become a common concern, I also mentioned this lack of fully explained path in in #2324. The problem is a UI issue though, and may be resolved by having multiple path explanations for each result. I think we should steer clear of this in QL for now. We can loop @sj in the next time we encounter this.

As such, I would prefer the other solution (using local dataflow for starters) and then letting definitions.ql aid the developer in figuring out why the flow is remote.

Besides, the current solution is semantically wrong: the response data node is not the instance, period. Treating ClientRequest as a remote flow source is just begging for unrelated problems when the request itself flows to some other sink. I agree that it could make sense to add any(RemoteBloodhoundClientRequest r) to Configuration::isSink(s) in a custom query, but in general we should stick to the semantically correct formulation.


TypeaheadSuggestionFunction() {
typeaheadCall = JQuery::objectRef().getAMethodCall("typeahead") and
this = typeaheadCall
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this matches $(...).typeahead(..., { templates: { suggestion: <this> } }), right? Can we add that as a comment?

(
pred = this
or
pred = this.getAFunctionValue().getParameter(1).getACall().getAnArgument()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a source code example for this disjunct?
It would also be nice if another comment could explain why it makes sense that pred is defined by two disjuncts that are so different from each other. It smells like we are trying very hard to provide useful paths for the programmer, which we unfortunately always can if it means that we are cutting semantic corners (c.f. discussion for the RemoteBloodhoundClientRequest class).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After changing to the semantically correct solution (discussion above), this disjunct will no longer be necessary.

* A taint step that models a call to `.ttAdapter()` on an instance of Bloodhound.
*/
class BloodHoundAdapterStep extends TaintTracking::AdditionalTaintStep, BloodhoundInstance {
DataFlow::Node successor;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is not bound.

/**
* A taint step that models a call to `.ttAdapter()` on an instance of Bloodhound.
*/
class BloodHoundAdapterStep extends TaintTracking::AdditionalTaintStep, BloodhoundInstance {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire class should be removed if we use typetracking in RemoteBloodhoundClientRequest::getAResponseDataNode, right.



from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add:

and source.getNode() instanceof RemoteServerResponse

or perhaps:

and source.getNode() instanceof HeuristicSource

to avoid reporting the ordinary XSS results twice in the tests?

Comment thread javascript/ql/src/javascript.qll Outdated
import semmle.javascript.frameworks.Electron
import semmle.javascript.frameworks.Files
import semmle.javascript.frameworks.Firebase
import semmle.javascript.frameworks.typeahead
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want this spelled as Typeahead, jQuery is a special case. Also, this list of imports happens to be sorted.

Copy link
Copy Markdown
Contributor

@esbena esbena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close.

One general thing:
Can you have a pass over all the predicates and consider which ones we really want to expose to the public?
I think I would like to play it safe in the name of compatibility maintenance with this library and make everything but TypeaheadSuggestionFunction private.

Comment thread javascript/ql/src/semmle/javascript/frameworks/Typeahead.qll Outdated
Comment thread javascript/ql/src/semmle/javascript/frameworks/Typeahead.qll Outdated
exists(DataFlow::TypeTracker t2 | result = ref(t2).track(t2, t))
}

DataFlow::SourceNode ref() { result = ref(DataFlow::TypeTracker::end()) }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment for the above predicate.

Comment thread javascript/ql/src/semmle/javascript/frameworks/Typeahead.qll Outdated
Comment thread javascript/ql/src/semmle/javascript/frameworks/Typeahead.qll Outdated
Comment thread javascript/ql/src/semmle/javascript/frameworks/Typeahead.qll Outdated
Comment thread change-notes/1.24/analysis-javascript.md
@erik-krogh
Copy link
Copy Markdown
Contributor Author

  • is the model general enough to support other suggestion providers than bloodhound? (I think it is, but please double check, perhaps make a note of other common suggestion providers that may be worth modelling later).

The model does not support other third party suggestion providers. But it does support suggestion providers that are part of the client itself.
I looked through the source in all of my benchmarks, and if another remote source is used then a wrapper function is usually needed in the client.
And our models play nicely together when such a wrapper function is used.

Copy link
Copy Markdown
Contributor

@esbena esbena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final refactoring suggestion (changes semantics slightly)

DataFlow::CallNode getTypeaheadCall() { result = typeaheadCall }
}

/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry for the repeated refactoring of this class. I think we can untangle the concepts a bit more:

/**
 * A `source` option for a typeahead.js plugin instance.
 */
private class TypeaheadSource extends DataFlow::ValueNode {
  DataFlow::CallNode typeaheadCall;

  TypeaheadSource() {
    typeaheadCall = JQuery::objectRef().getAMethodCall("typeahead") and
    this = typeaheadCall.getOptionArgument(1, "source")
  }

  /** Gets a node for a suggestion that this source motivates. */
  DataFlow::Node getASuggestion() {
    exists(TypeaheadSuggestionFunction suggestionCallback |
      suggestionCallback.getTypeaheadCall() = typeaheadCall and
      result = suggestionCallback.getParameter(0)
    )
  }
}
  • Now the source exists independently of a suggestion function, which seems semantically cleaner.
  • The genericly named getSuccessor has a more descriptive name

(If we were to redesign this typeahead model. I think I would introduce a class namedTypeaheadInstance, defined as JQuery::objectRef().getAMethodCall("typeahead"), this class would then expose the connection between the TypeaheadSource and TypeaheadSuggestionFunction, but the current visibility choice does not prevent us from doing that later.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If we were to redesign this typeahead model. I think I would introduce a class namedTypeaheadInstance, defined as JQuery::objectRef().getAMethodCall("typeahead")

With your change JQuery::objectRef().getAMethodCall("typeahead") is already twice in the model, so I'll go ahead and refactor it.

@erik-krogh
Copy link
Copy Markdown
Contributor Author

Performance is not that bad, but its also not perfect.

esbena
esbena previously approved these changes Nov 27, 2019
@esbena
Copy link
Copy Markdown
Contributor

esbena commented Nov 27, 2019

Approved. But lets do a performance evaluation that includes a few more (taint) queries (or just run the security suite) before merging.

@erik-krogh
Copy link
Copy Markdown
Contributor Author

Approved. But lets do a performance evaluation that includes a few more (taint) queries (or just run the security suite) before merging.

On it.
I miss the romans.

@erik-krogh
Copy link
Copy Markdown
Contributor Author

Approved. But lets do a performance evaluation that includes a few more (taint) queries (or just run the security suite) before merging.

https://git.semmle.com/erik/dist-compare-reports/tree/profiling-erik-krogh.northeurope.cloudapp.azure.com_1574920882307

Looks ok to me.
Can you do a re-approve?

@esbena
Copy link
Copy Markdown
Contributor

esbena commented Nov 29, 2019

Lets rerun the two slowest projects projects with the largest relative slowdowns (both shas), the 10% overheads are hopefully not reproducible.

@erik-krogh
Copy link
Copy Markdown
Contributor Author

Lets rerun the two slowest projects projects with the largest relative slowdowns (both shas), the 10% overheads are hopefully not reproducible.

I did the 5 slowest projects instead of just 2.
The 10% overheads were not reproducible.
https://git.semmle.com/erik/dist-compare-reports/tree/profiling-erik-krogh.northeurope.cloudapp.azure.com_1575222915368

@erik-krogh
Copy link
Copy Markdown
Contributor Author

And now I also got the tests to pass

@esbena esbena removed the WIP This is a work-in-progress, do not merge yet! label Dec 3, 2019
@semmle-qlci semmle-qlci merged commit cfcd18b into github:master Dec 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants