Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Add support for Python 3.12 type syntax #14636

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

tausbn
Copy link
Contributor

@tausbn tausbn commented Oct 30, 2023

Depends on an internal PR.

@tausbn tausbn added the depends on internal PR This PR should only be merged in sync with an internal Semmle PR label Oct 30, 2023
@tausbn tausbn force-pushed the tausbn/python-add-support-for-python-3.12-type-syntax branch 7 times, most recently from 19b2af6 to 88a163e Compare November 6, 2023 13:42
For these, I opted for a placement that would cause as few changes to the
dbscheme as possible. This puts the new `type_parameters` fields as the
last field on function and class definitions.
In the upgrade direction, we simply do nothing.

In the downgrade direction, we remove the two new relations, and
also any `Stmt` nodes corresponding to `TypeAlias` nodes.
With `AstNode` defined as a union of other classes, we don't get this for free.

(Compare with `DictItem`, which is in a similar situation.)
@tausbn tausbn force-pushed the tausbn/python-add-support-for-python-3.12-type-syntax branch from 88a163e to 75e6de8 Compare November 6, 2023 13:51
@tausbn tausbn marked this pull request as ready for review November 6, 2023 14:44
@tausbn tausbn requested a review from a team as a code owner November 6, 2023 14:44
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the comment about .getScope, these changes LGTM 👍


override AstNode getAChildNode() { none() }

override Scope getScope() { none() }
Copy link
Member

@RasmusWL RasmusWL Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks strange that the getScope method has no result.

I investigated how DictItem works, which also implements getScope as none(). However, I found that all its' subclasses provide an overriding getScope implementation:

So any DictItem will actually have a valid .getScope() result due to this.

As far as I can tell, currently no other AstNode have empty .getScope, so I think it could make sense to add something here as well 😊

I wasn't quite sure what the scope should be though 🤔 So I had a look at the official Python grammar. To me it seems reasonable that the scope should be the class/function the type-parameter is attached to, or the same scope as the containing TypeAlias -- what do you think?

Closing thought: I guess it's this is more of a nitpick than a super important problem that will make everything break, but it's just sort of strange that a AST node doesn't have any scope 🤔 🙃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're completely right. Good catch!

The scoping of type variables is a bit peculiar in Python, so I'll have to think about what the right solution is. I agree that for def foo[T](...) and class Foo[T]: ..., the scope of T is the function/class itself (as this construction makes T available for the body of the function/class).

For type Foo[T] = expr, I feel like this creates a scope of its own, since it allows T to be referenced inside expr, but not outside the type alias. In practice, however, it's probably harmless to give it the same scope as the one containing the type alias.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For type Foo[T] = expr, I feel like this creates a scope of its own, since it allows T to be referenced inside expr, but not outside the type alias. In practice, however, it's probably harmless to give it the same scope as the one containing the type alias.

Ah, I had missed that part. In What's new in Python 3.12 they mention:

In order to support these scoping semantics, a new kind of scope is introduced, the annotation scope. Annotation scopes behave for the most part like function scopes, but interact differently with enclosing class scopes. In Python 3.13, annotations will also be evaluated in annotation scopes.

So I guess a "proper" fix is to follow the scoping behaviors described here: https://peps.python.org/pep-0695/#scoping-behavior

I guess this calls for a test of whether we can resolve a reference to T inside expr -- I'm actually not 100% sure what the result would be, although I not super optimistic about it 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this for a bit, and the way I see it there are two options here.

Firstly, we can accept the current state (+ the scope fixes I'll be adding shortly). That is

  • Type parameters on functions and classes are scoped to these entities.
  • Type parameters on type aliases have the same scope as the alias itself (which is wrong, but harmless).
  • There are no control-flow nodes for any of these new constructs (in line with the fact that they are not eagerly evaluated in Python).
  • There's no data-flow for these values (because they don't ever have any values with which to flow).

Alternatively, we could add control-flow nodes for these constructs, and pretend that they do get evaluated at runtime (we can allow ourselves to be a bit more lax than the Python interpreter). In particular, we would want control-flow that evaluates type bounds before the body of the function/class/type alias, and also that the right hand side of a type alias gets evaluated before the next statement. We could also add flow steps from these parameters to the places where they are used (taking into account things like shadowing of type parameters).

The thing that's not clear to me is what we actually gain from this more precise modelling. The best I can come up with is that we'll (maybe?) have a harder time
modelling things like

def request_handler[R: Request](r: R):
    ...

where we have a handler that's generic in some overall type of requests, and we want to identify that r is indeed a value containing a request. But it also seems to me that this should be possible to model without the need of data-flow. In particular, it seems to me that these type parameters are only meant to be used within a given scope (that is, what would it even mean to pass a type parameter as an argument into a call to some other function?). If we really do want this option, though, I think we should also extend our modelling to support defining these things using the typing module (seeing as not a lot of code will be using Python 3.12 yet).

The two options above are not incompatible. We can easily do the first, and then revisit it if it turns out we need the second solution at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
depends on internal PR This PR should only be merged in sync with an internal Semmle PR Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants