Skip to content

Swift: reorganize VarDecl instances within BraceStmt#13240

Merged
d10c merged 12 commits into
github:mainfrom
d10c:swift/brace-stmt-variables
Jun 14, 2023
Merged

Swift: reorganize VarDecl instances within BraceStmt#13240
d10c merged 12 commits into
github:mainfrom
d10c:swift/brace-stmt-variables

Conversation

@d10c
Copy link
Copy Markdown
Contributor

@d10c d10c commented May 22, 2023

At the QL level, this partitions the children of BraceStmt into VarDecls (as getVariable) and non-VarDecls (as getElement).

See https://github.com/github/codeql-c-team/issues/1223

@github-actions github-actions Bot added the Swift label May 22, 2023
@redsun82
Copy link
Copy Markdown
Contributor

This goes a bit against the philosophy we tried to stick to, to put as little logic as possible in the extractor when we could do things on the QL side. Shouldn't we stick to that, and do this on QL overriding getImmediateElement, using the rank function to do the indexed filtering?

@redsun82
Copy link
Copy Markdown
Contributor

redsun82 commented May 23, 2023

You might argue that it is simpler to implement this on the extractor side, but a counterargument there is that whatever you save in QL complexity, you still have to put into the upgrade/downgrade scripts any way.

Copy link
Copy Markdown
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand this, getVariable will return just the variables but getElement will still return everything (including the variables) under the BraceStmt? My understanding was that the original intent was to have getElement return just the non-variable ("proper AST") children. Has this intent changed?

Comment thread swift/ql/lib/codeql/swift/generated/stmt/BraceStmt.qll Outdated
@redsun82
Copy link
Copy Markdown
Contributor

Oh, I can see now an issue with doing it solely on the QL side: we still want the variables to appear as AST children, which is hard to do currently for "synthesized" properties such as this would be. Some codegen support is required, let me have a quick look.

@d10c
Copy link
Copy Markdown
Contributor Author

d10c commented May 23, 2023

@redsun82 Yes, a schema change is needed due to wanting variables to be a child of BraceStmt. But this can also be done without an extractor change or non-trivial upgrade script, by overriding both getImmediateElement and getImmediateVariable to move the variables from the former to the latter. It's not elegant, but do you think it's worth it?

@d10c d10c force-pushed the swift/brace-stmt-variables branch from a8b46e9 to 4f6826f Compare May 23, 2023 13:20
@d10c d10c changed the base branch from main to codeql-cli-2.13.3 May 23, 2023 13:22
@redsun82
Copy link
Copy Markdown
Contributor

@redsun82 Yes, a schema change is needed due to wanting variables to be a child of BraceStmt. But this can also be done without an extractor change or non-trivial upgrade script, by overriding both getImmediateElement and getImmediateVariable to move the variables from the former to the latter. It's not elegant, but do you think it's worth it?

That would still require a (no-op) upgrade/downgrade script because of the dbscheme change. But I can add a synth property modifier to enable synthesized properties of non-synthesized classes, which should enable an elegant solution. In the meantime though you could already go for what you propose, the hand-written QL code will be exactly the same.

Let's keep in mind that as @geoffw0 pointed out, this will still be in conflict with #13249 (after that, you'll need to override getVariable instead of getImmediateVariable), but let's see what PR is ready first 🙂

@redsun82
Copy link
Copy Markdown
Contributor

if you rebase on top of #13258 you should be good to go to give a clean QL-only solution

@d10c d10c force-pushed the swift/brace-stmt-variables branch from 4f6826f to 67696da Compare May 23, 2023 14:36
Comment thread swift/schema.py Outdated
@d10c d10c force-pushed the swift/brace-stmt-variables branch from 67696da to 9e75953 Compare June 5, 2023 08:37
Comment thread swift/ql/lib/codeql/swift/elements/stmt/BraceStmt.qll Fixed
@d10c d10c force-pushed the swift/brace-stmt-variables branch 3 times, most recently from 5a6357a to 46d03c6 Compare June 6, 2023 13:37
# 65| var ... = ...
#-----| -> x1

# 65| x1
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we fine with the general lack of VarDecl nodes in the CFG of a BraceStmt? Even though they are apparently unnecessary for data flow.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a bit unexpected from a user point-of-view, but we expect that very few people write queries directly on the control-flow graph. So it's probably fine.

@d10c d10c force-pushed the swift/brace-stmt-variables branch from efdc0f7 to a831456 Compare June 6, 2023 14:55
@d10c d10c marked this pull request as ready for review June 6, 2023 15:40
@d10c d10c requested a review from a team as a code owner June 6, 2023 15:40
d10c added 3 commits June 7, 2023 10:51
Ideally the EDB itself should contain a direct
reference from NamedPattern to VarDecl, not just a name,
but oh well, this join order works fine.

BEFORE:

```
[2023-06-08 11:40:01] Evaluated non-recursive predicate quick_eval#ff@60fe07kr in 6533ms (size: 91309).
Evaluated relational algebra for predicate quick_eval#ff@60fe07kr with tuple counts:
          1209062   ~3%    {2} r1 = SCAN VarDecl#914e0d1e::Generated::VarDecl::getName#0#dispred#ff OUTPUT In.1, In.0
        234687793   ~0%    {2} r2 = JOIN r1 WITH NamedPattern#c3d26570::Generated::NamedPattern::getName#0#dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Lhs.1, Rhs.1
         19112791   ~0%    {3} r3 = JOIN r2 WITH VarDecl#914e0d1e::Generated::VarDecl::getImmediateParentPattern#0#dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1

         19112791   ~0%    {3} r4 = JOIN r3 WITH Element#e67432df::Generated::Element::resolve#bf ON FIRST 1 OUTPUT Rhs.1, Lhs.2, Lhs.1
            24647   ~0%    {2} r5 = JOIN r4 WITH Element#d22cfd66::Element::getFullyUnresolved#bf ON FIRST 2 OUTPUT Lhs.1, Lhs.2

         19112791   ~0%    {3} r6 = JOIN r3 WITH Element#e67432df::Generated::Element::resolve#bf ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
         19112791   ~3%    {3} r7 = JOIN r6 WITH Element#d22cfd66::Element::getFullyUnresolved#bf ON FIRST 1 OUTPUT Lhs.2, Rhs.1, Lhs.1
            66662   ~4%    {2} r8 = JOIN r7 WITH #Pattern#19b8cf65::Pattern::getImmediateEnclosingPattern#0#dispredPlus#bf ON FIRST 2 OUTPUT Lhs.0, Lhs.2

            91309   ~2%    {2} r9 = r5 UNION r8
                           return r9
```

AFTER:

```
[2023-06-08 11:55:26] Evaluated non-recursive predicate quick_eval#ff@fe906afo in 26ms (size: 91309).
Evaluated relational algebra for predicate quick_eval#ff@fe906afo with tuple counts:
         92048   ~0%    {3} r1 = SCAN NamedPattern#c3d26570::Generated::NamedPattern::getName#0#dispred#ff OUTPUT In.0, In.1, In.0

         82893   ~0%    {2} r2 = SCAN #Pattern#19b8cf65::Pattern::getImmediateEnclosingPattern#0#dispredPlus#fb#flipped OUTPUT In.1, In.0
         66417   ~1%    {3} r3 = JOIN r2 WITH NamedPattern#c3d26570::Generated::NamedPattern::getName#0#dispred#ff ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.0

        158465   ~0%    {3} r4 = r1 UNION r3
         94246   ~3%    {3} r5 = JOIN r4 WITH VarDecl#914e0d1e::Generated::VarDecl::getImmediateParentPattern#0#dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
         91309   ~2%    {2} r6 = JOIN r5 WITH VarDecl#914e0d1e::Generated::VarDecl::getName#0#dispred#ff ON FIRST 2 OUTPUT Lhs.2, Lhs.0
                        return r6
```
Comment thread swift/ql/lib/codeql/swift/elements/pattern/NamedPattern.qll Outdated
Copy link
Copy Markdown
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a lot to add. This LGTM apart from the performance issue. If you don't find an answer quickly, I'd be happy to help with that.

---
category: breaking
---
* The `BraceStmt` AST node's `AstNode getElement(index)` member predicate no longer returns `VarDecl`s after the `PatternBindingDecl` that declares them. Instead, a new `VarDecl getVariable(index)` predicate has been introduced for accessing the variables declared in a `BraceStmt`. This change only affects query writers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

d10c added 2 commits June 14, 2023 14:52
This brings it down to 85ms when run from a query, not just from quick-eval:

```

[2023-06-14 14:47:06] Evaluated non-recursive predicate NamedPattern#1696c0d8::NamedPattern::getVarDecl#0#dispred#ff@04392e6o in 85ms (size: 91309).
Evaluated relational algebra for predicate NamedPattern#1696c0d8::NamedPattern::getVarDecl#0#dispred#ff@04392e6o with tuple counts:
        1310544   ~9%    {2} r1 = SCAN var_decls OUTPUT In.0, In.1
        1209062   ~0%    {2} r2 = STREAM DEDUP r1
        1209062   ~0%    {2} r3 = JOIN r2 WITH Synth#5f134a93::Synth::convertVarDeclToRaw#1#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
          91309   ~0%    {3} r4 = JOIN r3 WITH VarDecl#914e0d1e::Generated::VarDecl::getImmediateParentPattern#0#dispred#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0

          69599   ~0%    {3} r5 = JOIN r4 WITH #Pattern#19b8cf65::Pattern::getImmediateEnclosingPattern#0#dispredPlus#bf_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2

         160908   ~1%    {3} r6 = r4 UNION r5
          94246   ~0%    {4} r7 = JOIN r6 WITH Synth#5f134a93::Synth::convertNamedPatternToRaw#1#ff ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.0
          91309   ~1%    {2} r8 = JOIN r7 WITH named_patterns ON FIRST 2 OUTPUT Lhs.3, Lhs.2
                         return r8
```
Copy link
Copy Markdown
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance fix works for me as well. We can wait for DCA as well just to be safe.

@d10c d10c merged commit 4d2e304 into github:main Jun 14, 2023
@d10c
Copy link
Copy Markdown
Contributor Author

d10c commented Jun 14, 2023

Merging; DCA looks good, the reported MethodLookupExpr::getUnderlying slowdown appears to be just noise, given that there is no change:

Before:

[2023-06-14 18:13:17] Evaluated non-recursive predicate MethodLookupExpr#fbdb774d::MethodLookupExpr::getUnderlying#0#dispred#ff@70e709ff in 2ms (size: 169151).
Evaluated relational algebra for predicate MethodLookupExpr#fbdb774d::MethodLookupExpr::getUnderlying#0#dispred#ff@70e709ff with tuple counts:
        169151  ~0%    {2} r1 = SCAN Synth#5f134a93::Synth::TMethodLookupExpr#ff OUTPUT In.1, In.0
                       return r1

After:

[2023-06-14 18:13:17] Evaluated non-recursive predicate MethodLookupExpr#fbdb774d::MethodLookupExpr::getUnderlying#0#dispred#ff@70e709ff in 2ms (size: 169151).
Evaluated relational algebra for predicate MethodLookupExpr#fbdb774d::MethodLookupExpr::getUnderlying#0#dispred#ff@70e709ff with tuple counts:
        169151  ~0%    {2} r1 = SCAN Synth#5f134a93::Synth::TMethodLookupExpr#ff OUTPUT In.1, In.0
                       return r1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants