[red-knot] initial (very incomplete) flow graph by carljm · Pull Request #11624 · astral-sh/ruff

carljm · 2024-05-31T03:21:51Z

Summary

Introduces the skeleton of the flow graph. So far it doesn't actually handle any non-linear control flow :) But it does show how we can go from an expression that references a symbol, backward through the flow graph, to find reachable definitions of that symbol.

Adding non-linear control flow will mean adding flow nodes with multiple predecessors, which will introduce more complexity into ReachableDefinitionsIterator.next(). But one step at a time.

Test Plan

Added a (very basic) test.

github-actions · 2024-05-31T03:35:09Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

MichaReiser · 2024-05-31T06:55:32Z

            table: SymbolTable::new(),
            scopes: vec![root_scope_id],
            current_definition: None,
+            current_flow_node: FlowGraph::start(),


I would have expected that the flow graph is separate from the symbol table so that we can built it separately. Maybe that's not worth it because of the overhead of a second AST pass. But it just doesn't feel like it's part of a symbol table.

Two other reasons why it might be worth to move it out of SymbolTable

I think it could be interesting to build the flow graph at a more granular level. E.g. allow querying the top-level module graph or the module graph per function, or class. I think that could be interesting because building the CFG can be somewhat involved and we should avoid building it for code that we never end up analyzing (This assumes that basic blocks never have edges to basic blocks from other functions)

We need to think about how we want to split SymbolTable long term between the data we want to store in a persistent cache and we probably don't want to cache the CFG (but it might as well be that we don't cache anything in SymbolTable, so that this isn't a concern)

another reason for splitting the CFG out is that we may want to navigate the AST in a different order. I think our existing implementation navigates the AST backward to ensure it knows the branch targets ahead of time.

https://github.com/augustelalande/ruff/blob/87a3e967d4e8df53989ff1d1512749ba1df23328/crates/ruff_linter/src/rules/ruff/rules/unreachable.rs#L737-L740

I initially thought the same, and even started building it separately initially. But after working on it for a while, I concluded that it should not be separate.

The CFG is very closely tied to Definitions. The main purpose of our CFG is for determining reachability of definitions. Definitions are also tied to a Symbol. It will be a lot more difficult to reconstruct these relationships if we build CFG in a separate pass from symbol tables; it's much simpler if we build them together. And it's also more efficient to do a single AST traversal, rather than two.

There is never a scenario where we need (local) symbols and definitions, and we don't need a CFG, or vice versa. We always need to use them together. Therefore there isn't a gain in keeping them separate.

I agree that it's awkward naming-wise; I don't think CFG is really "part of" a symbol table, but that is currently how it's structured in this PR. This is mostly just because I wanted to get feedback on the PR before embarking on more invasive renamings / restructuring (and I'm kind of trying to avoid big restructurings anyway while Salsa rework is in progress.) I think that ultimately this should be restructured so we have a different name (in one doc I called it "indexing", pyright calls it "binding") for this information-collecting AST pass, and then symbol tables and flow graph would both be part of the "index" or whatever we call it, rather than having flow graph be part of symbol tables. But we would still build them together.

I also agree that we may want to explore "indexing" at a more granular level than whole-module-at-a-time. (I'm not totally sure this will actually make sense, but it's worth discussing at least.) But if we do this, it should apply to symbol tables and definitions also, it shouldn't apply only to CFG. If we don't need the CFG of some function, we also don't need its symbol tables or definitions. If we need its symbol tables or definitions, we need the CFG (otherwise we have no idea which definitions are relevant.) So more granular indexing does not imply doing something different for CFG than for symbol definitions.

Related to the previous point, I also agree we need to think about symbol table splitting, but again I think this is orthogonal to CFG. Wherever we have definitions, we should have CFG. So CFG should always be tied together with "local symbols" -- the stuff we need for local analysis -- but separate from "public symbols" and their types -- the stuff we persistent cache.

I am quite confident after looking at pyright and the ruff PR that for our purposes there is no reason to build CFG by backwards AST traversal, so I don't think that's a consideration. (If you are building a CFG that you want to traverse in the forwards direction, it can be more convenient to build it by backwards AST traversal, because you want to track successors of each node. But we are building a CFG for backwards traversal, so we want predecessors of each node; it's more convenient to build this by forwards AST traversal. Of course if you ultimately want both successor and predecessor linkages, you do this by making another pass over the CFG once it is initially built with linkage in just one direction.)

MichaReiser · 2024-05-31T06:57:56Z

        &self.dependencies
    }

+    pub(crate) fn reachable_definitions(


Can we add some documentation to what this iterator returns. What I understand is that it finds the definitions that are reachable from the current node by navigating backwards through the CFG? How does this work for definitions that aren't defined in the same function?

+1 for some more docs here

Sure, I can add a docstring here.

What I understand is that it finds the definitions that are reachable from the current node by navigating backwards through the CFG?

Yes, that's correct.

How does this work for definitions that aren't defined in the same function?

We only consider local definitions when deciding the type of a symbol. This is necessary for efficient type inference, otherwise the entire program would have to be analyzed in order to decide the type of any module-level symbol.

Imports count as local definitions, and of course type inference will traverse the import and look at the type of the imported name to decide the type of the import "definition." So reachability analysis only needs to worry about finding the local import definition. Similarly function arguments are definitions; in this case the only type information we have is the annotation on the argument, since we don't want to have to find every call site of the function. (That would mean we have to analyze all dependent modules in order to infer types in any function.) We may in some cases infer a relationship via our local CFG between the type of an argument and the function's return type, which we would model using typevars; this can still allow some amount of cross-function inference.

Assignments to a name from outside its scope (e.g. a nested function using nonlocal to assign to a name in an enclosing scope, a function using global to assign to a module-level name, or a module doing import mod; mod.a = ...) should be compatible with its public type. Those assignments will be analyzed when their own scope is analyzed. Names that might be subject to such external reassignments (i.e. module-level names, or cellvars that are assigned to by a nested scope) do need special handling in type inference / narrowing to account for the possibility of external reassignment.

MichaReiser · 2024-05-31T06:58:42Z

+                FlowNode::Definition(def_node) => {
+                    self.flow_node_id = def_node.predecessor;
+                    if def_node.symbol_id == self.symbol_id {
+                        return Some(def_node.definition.clone());


We should look into making Definition Copy if that's something we return frequently. I add it to our monthly plan.

carljm · 2024-05-31T20:27:13Z

Going to go ahead and merge, remaining comments are all larger issues that may need discussion and can be changed in future.

carljm added 2 commits May 30, 2024 19:42

[red-knot] expose constructor for NodeKey

04fabe5

[red-knot] initial (very incomplete) flow graph

e23371f

carljm requested review from AlexWaygood and MichaReiser May 31, 2024 03:24

carljm added the internal An internal refactor or improvement label May 31, 2024

carljm requested a review from plredmond May 31, 2024 03:40

MichaReiser approved these changes May 31, 2024

View reviewed changes

add docstring for reachable_definitions

3b691d2

carljm merged commit 27f6f04 into astral-sh:main May 31, 2024

This was referenced May 31, 2024

[red-knot] backwards-traversable CFG for discovering reachable definitions of a symbol #11655

Closed

[red-knot] symbol table should not own the flow graph #11657

Closed

carljm deleted the cfg branch May 31, 2024 23:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] initial (very incomplete) flow graph#11624

[red-knot] initial (very incomplete) flow graph#11624
carljm merged 3 commits intoastral-sh:mainfrom
carljm:cfg

carljm commented May 31, 2024 •

edited

Loading

Uh oh!

github-actions Bot commented May 31, 2024 •

edited

Loading

Uh oh!

MichaReiser May 31, 2024

Uh oh!

MichaReiser May 31, 2024

Uh oh!

carljm May 31, 2024 •

edited

Loading

Uh oh!

MichaReiser May 31, 2024

Uh oh!

AlexWaygood May 31, 2024

Uh oh!

carljm May 31, 2024 •

edited

Loading

Uh oh!

MichaReiser May 31, 2024

Uh oh!

carljm commented May 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

carljm commented May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

github-actions Bot commented May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Uh oh!

MichaReiser May 31, 2024

Choose a reason for hiding this comment

Uh oh!

MichaReiser May 31, 2024

Choose a reason for hiding this comment

Uh oh!

carljm May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser May 31, 2024

Choose a reason for hiding this comment

Uh oh!

AlexWaygood May 31, 2024

Choose a reason for hiding this comment

Uh oh!

carljm May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser May 31, 2024

Choose a reason for hiding this comment

Uh oh!

carljm commented May 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

carljm commented May 31, 2024 •

edited

Loading

github-actions Bot commented May 31, 2024 •

edited

Loading

`ruff-ecosystem` results

carljm May 31, 2024 •

edited

Loading

carljm May 31, 2024 •

edited

Loading