Python: Add API graph support for parameter annotations#18112
Conversation
Adds API graph support for observing that in ```python def foo(x : Bar): ... ``` The variable `x` is likely to be an instance of the type `Bar` inside this function. In particular, we add `getInstanceFromAnnotation` as a predicate on API graph nodes that tracks this step (corresponding to a new edge type labeled with "annotation" in the API graph), and extend the existing `getAnInstance` predicate to also include instances arising from type annotations. A more complete solution would also add support for annotated assignments (`x : Foo = ...` or just `x : Foo`) as well as track types through type aliases (`type Foo = Bar`). This turns out to be non-trivial, however, as these type constructs don't have any CFG nodes (and so no data-flow nodes by default either). In order to not have perfect be the enemy of good, this commit is only targeting the type parameter case (which is also likely to be the most common use case anyway). The tests for API graphs have been extended accordingly, including tests for the kinds of type ascriptions that we _don't_ currently model in API graphs (marked with `MISSING:` in the inline tests).
|
Performance comparison looks completely uneventful. Opening this up for review. |
| local_x #$ MISSING: use=moduleImport("types").getMember("AssignmentAnnotation").getAnnotatedInstance() | ||
|
|
||
| global_x : AssignmentAnnotation #$ use=moduleImport("types").getMember("AssignmentAnnotation") | ||
| global_x #$ MISSING: use=moduleImport("types").getMember("AssignmentAnnotation").getAnnotatedInstance() |
There was a problem hiding this comment.
Why is this missing? Is it because there is no assignment on the line above, so that global_x is not in getTarget (which is presumably empty)?
There was a problem hiding this comment.
Hmm... This is a very salient question. If I quick-eval the annotatedInstance predicate, I get four results:
ControlFlowNode for ImportMember, ControlFlowNode for global_xControlFlowNode for ImportMember, ControlFlowNode for parameter_yControlFlowNode for Alias, ControlFlowNode for parameter_zControlFlowNode for Alias, ControlFlowNode for global_z
So, we are picking up the instancing from from ... import AssignmentAnnotation to global_x : AssignmentAnnotation, but we're not picking up that the annotation is a use of that same identifier as in the import statement. What's curious, then, is that global_x isn't seen as an instance of AssignmentAnnotation. For global_z it makes sense, since we don't understand the simple type aliasing that's taking place on line 13.
Looking at getTarget it does exist for the type ascription of global_x, but it seems that we do not track the flow between the two occurrences of global_x. I'm trying to figure out why now.
There was a problem hiding this comment.
A thought occurred to me after writing that message. Could it be that we're observing that global_x gets overwritten here (because it's the target of an assignment), but then when we go to see what value was assigned we don't find it (because it's just a type ascription)? That would explain the weird behaviour.
There was a problem hiding this comment.
Yes that could be it. I wonder if global_x in global_x : AssignmentAnnotation should actually be considered a use rather than a def..
yoff
left a comment
There was a problem hiding this comment.
As discussed offline, let us merge this now; it is a clear improvement :-)
Adds API graph support for observing that in
The variable
xis likely to be an instance of the typeBarinside this function.In particular, we add
getInstanceFromAnnotationas a predicate on API graph nodes that tracks this step (corresponding to a new edge type labeled with "annotation" in the API graph), and extend the existinggetAnInstancepredicate to also include instances arising from type annotations.A more complete solution would also add support for annotated assignments (
x : Foo = ...or justx : Foo) as well as track types through type aliases (type Foo = Bar). This turns out to be non-trivial, however, as these type constructs don't have any CFG nodes (and so no data-flow nodes by default either). In order to not have perfect be the enemy of good, this commit is only targeting the type parameter case (which is also likely to be the most common use case anyway).The tests for API graphs have been extended accordingly, including tests for the kinds of type ascriptions that we don't currently model in API graphs (marked with
MISSING:in the inline tests).Pull Request checklist
All query authors
.qhelp. See the documentation in this repository.Internal query authors only
.ql,.qll, or.qhelpfiles. See the documentation (internal access required).