Skip to content

[mypyc] Fix @property getter memory leak#21230

Open
VaggelisD wants to merge 1 commit intopython:masterfrom
VaggelisD:property_getter_leak
Open

[mypyc] Fix @property getter memory leak#21230
VaggelisD wants to merge 1 commit intopython:masterfrom
VaggelisD:property_getter_leak

Conversation

@VaggelisD
Copy link
Copy Markdown
Contributor

Issue

Mypyc can incorrectly borrow references from property getters (owned references).

The borrow mechanism is safe for struct field access since they return a pointer into the object's memory; Property getters on the other hand are method calls that return new owned references i.e the caller must DECREF them. If mypyc mistakes a property for a struct field, it skips the DECREF, leaking one reference per call.

I think this affects any expression context that enables borrowing (comparisons, isinstance, is None checks) when the borrowed expression is a property access on a cross-module class. We discovered this through OOM issues in SQLGlot, where isinstance(self.this, Func) (this is a @property) leaked on every call.

Repro

  # base.py:
  class Bar:
      pass  
          
  class Foo:
      def __init__(self) -> None:
          self.obj: object = Bar()

      @property
      def val(self) -> object:
          return self.obj  
                                                                  
  # derived.py:
  from base import Foo, Bar
                           
  def check(foo: Foo) -> bool:
      return isinstance(foo.val, Bar)

  # test_leak.py: 
  import sys
  from base import Foo 
  from derived import check    
                           
  foo = Foo()
  init = sys.getrefcount(foo.obj)
  for _ in range(100):     
      check(foo) 
  print(f"Leaked refs: {sys.getrefcount(foo.obj) - init}")

Compile both with mypyc passing in PYTHONHASHSEED=3 prints Leaked refs: 100.

Root cause

  1. is_native_attr_ref() decides if an attribute access can safely borrow by checking has_attr(name) and not get_method(name) i.e "if there's an attribute but no method, it's a struct field".
  2. Although has_attr() is always fully populated, get_method() depends on whether the class's module (base in this case) has been compiled yet.
  3. Modules within an SCC are compiled in nondeterministic order; If the module defining the property hasn't been compiled yet -> get_method() returns None -> is_native_attr_ref incorrectly treats the property as a borrowed struct field.

I think there's a broader issue here: Even when derived.py imports from base.py (no cycle), mypyc can place them in the same SCC and compile derived before base. Since ir.methods is only populated when a module's function bodies are compiled, any code that reads ir.methods (i.e get_methods) during compilation of a different module in the same SCC could have similar order-dependent bugs.

The suggested fix

Check ir.attributes directly (struct fields only, always populated) instead of the unreliable get_method. I believe this is the getter-side counterpart to #21095 which fixed the same class of bug for property setters.

`is_native_attr_ref()` uses `has_attr(name) and not get_method(name)` to
decide if an attribute access can borrow. `has_attr()` is populated
during preparation (always complete), but `get_method()` checks
`ir.methods` which is populated during compilation. When cross-module
classes haven't been compiled yet, `get_method()` returns None for
property getters, incorrectly allowing borrowing.

Property getters return new owned references, so skipping the DECREF
leaks one reference per call.

The fix checks `ir.attributes` directly (struct fields only, always
populated during preparation). Properties are never in `ir.attributes`,
so they always return False.

This is the getter-side counterpart to python#21095.
Copy link
Copy Markdown
Collaborator

@p-sawicki p-sawicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a broader issue here: Even when derived.py imports from base.py (no cycle), mypyc can place them in the same SCC and compile derived before base. Since ir.methods is only populated when a module's function bodies are compiled, any code that reads ir.methods (i.e get_methods) during compilation of a different module in the same SCC could have similar order-dependent bugs.

yes that's a bug. we shouldn't rely on members that are only filled during compilation of an SCC because it leads to non-determinism as you've pointed out. so functions like the one you've fixed should only query members that are populated before any of the SCCs are compiled in prepare.py.

Comment on lines +5948 to +5957
def test_property_getter_no_leak() -> None:
foo = Foo()
gc.collect()
before = gc.get_objects()
for _ in range(100):
check(foo)
gc.collect()
after = gc.get_objects()
diff = len(after) - len(before)
assert diff <= 2, diff
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this test fail for you without your changes? for me it doesn't and i think it's because there's only one Foo and Bar created so even though it's leaking references it's not leaking any more new objects.

to test if we're no longer leaking new objects i think you could move the creation of Foo into the loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants