Skip to content

Reduce memory usage by Dependant by ~50%#15336

Open
ipeluffo wants to merge 16 commits into
fastapi:masterfrom
ipeluffo:perf/improve-memory-dependencies
Open

Reduce memory usage by Dependant by ~50%#15336
ipeluffo wants to merge 16 commits into
fastapi:masterfrom
ipeluffo:perf/improve-memory-dependencies

Conversation

@ipeluffo
Copy link
Copy Markdown

Context

In #14742, I reported a significant spike in memory usage when upgrading FastAPI from v0.120.4 to v0.121.x (and later versions).

My analysis traced the regression to this refactor, which replaced computed fields in __post_init__ with functools.cached_property.

Root cause

The Python docs for functools.cached_property note:

This decorator interferes with the operation of [PEP 412](https://peps.python.org/pep-0412/) key-sharing dictionaries. This means that instance dictionaries can take more space than usual.

While the worst of this was addressed in Python 3.12+ python/cpython#101815, there is still an overhead: accessing __dict__ on an instance forces creation of a full dictionary object rather than a compact PyDictValues, adding a PyDictObject (three fields) per instance. At scale, this adds up.

The exact quote from the cpython issue:

It remains true that cached_property can cause instances to use slightly more memory, but the reason for this has changed (and the extra memory used will be significantly less.) The new reason is that any access of dict on an instance will force creation of a real dictionary object for that instance, rather than just a PyDictValues. The keys remain shared in the created dict, though. So the added memory use is no longer "all the keys" but rather just "a PyDictObject" (which consists of only three fields.)

Benchmarks

I wrote a script to measure memory usage across varying numbers of Dependant instances:

Benchmark script
import gc
import tracemalloc
from fastapi.dependencies.models import Dependant

samples_tests = (
    200_000,
    100_000,
    50_000,
    1_000,
    500,
)

def measure(samples: int):
    gc.collect()
    tracemalloc.start()
    objs = [Dependant() for i in range(samples)]

    # Trigger all cached properties
    for o in objs:
        o.oauth_scopes
        o.cache_key
        o._uses_scopes
        o._is_security_scheme
        # o._security_scheme  # skipped: asserts a SecurityBase, not None
        o._security_dependencies
        o.is_gen_callable
        o.is_async_gen_callable
        o.is_coroutine_callable
        o.computed_scope

    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    return current, peak, objs  # keep objs alive until after measurement

for samples in samples_tests:
    cur, peak, _ = measure(samples)
    print(f"Samples: {samples} - current={cur/1e6:.1f} MB  peak={peak/1e6:.1f} MB")

Before (current master, Python 3.13.12):

Samples: 200000 - current=318.4 MB  peak=318.4 MB
Samples: 100000 - current=159.2 MB  peak=159.2 MB
Samples: 50000 - current=79.6 MB  peak=79.6 MB
Samples: 1000 - current=1.6 MB  peak=1.6 MB
Samples: 500 - current=0.8 MB  peak=0.8 MB

After (this PR):

Samples: 200000 - current=156.8 MB  peak=156.8 MB
Samples: 100000 - current=78.4 MB  peak=78.4 MB
Samples: 50000 - current=39.2 MB  peak=39.2 MB
Samples: 1000 - current=0.8 MB  peak=0.8 MB
Samples: 500 - current=0.4 MB  peak=0.4 MB

Memory usage is reduced by approximately 50% across all cases.

Approach

The key enabler for the memory reduction is the use of __slots__. However, __slots__ is not compatible with functools.cached_property, so the caching strategy had to change.

I considered reverting to the previous __post_init__ approach but decided against it for two reasons:

  1. No lazy initialization — all computed fields would be evaluated eagerly on every Dependant instantiation, even if never accessed.
  2. Awkward dependency ordering — several computed fields depend on each other, making __post_init__ harder to maintain.

Instead, this PR introduces a different caching mechanism that preserves lazy evaluation while being compatible with __slots__.

ipeluffo and others added 14 commits April 14, 2026 10:57
```
Samples: 200000 - current=318.4 MB  peak=318.4 MB
Samples: 100000 - current=159.2 MB  peak=159.2 MB
Samples: 50000 - current=79.6 MB  peak=79.6 MB
Samples: 1000 - current=1.6 MB  peak=1.6 MB
Samples: 500 - current=0.8 MB  peak=0.8 MB
```
```
Samples: 200000 - current=318.4 MB  peak=318.4 MB
Samples: 100000 - current=159.2 MB  peak=159.2 MB
Samples: 50000 - current=79.6 MB  peak=79.6 MB
Samples: 1000 - current=1.6 MB  peak=1.6 MB
Samples: 500 - current=0.8 MB  peak=0.8 MB
```
… usage

```
Samples: 200000 - current=318.4 MB  peak=318.4 MB
Samples: 100000 - current=159.2 MB  peak=159.2 MB
Samples: 50000 - current=79.6 MB  peak=79.6 MB
Samples: 1000 - current=1.6 MB  peak=1.6 MB
Samples: 500 - current=0.8 MB  peak=0.8 MB
```
```
Samples: 200000 - current=318.4 MB  peak=318.4 MB
Samples: 100000 - current=159.2 MB  peak=159.2 MB
Samples: 50000 - current=79.6 MB  peak=79.6 MB
Samples: 1000 - current=1.6 MB  peak=1.6 MB
Samples: 500 - current=0.8 MB  peak=0.8 MB
```
…ory usage

```
Samples: 200000 - current=318.4 MB  peak=318.4 MB
Samples: 100000 - current=159.2 MB  peak=159.2 MB
Samples: 50000 - current=79.6 MB  peak=79.6 MB
Samples: 1000 - current=1.6 MB  peak=1.6 MB
Samples: 500 - current=0.8 MB  peak=0.8 MB
```
```
Samples: 200000 - current=318.4 MB  peak=318.4 MB
Samples: 100000 - current=159.2 MB  peak=159.2 MB
Samples: 50000 - current=79.6 MB  peak=79.6 MB
Samples: 1000 - current=1.6 MB  peak=1.6 MB
Samples: 500 - current=0.8 MB  peak=0.8 MB
```
…ry usage

```
Samples: 200000 - current=318.4 MB  peak=318.4 MB
Samples: 100000 - current=159.2 MB  peak=159.2 MB
Samples: 50000 - current=79.6 MB  peak=79.6 MB
Samples: 1000 - current=1.6 MB  peak=1.6 MB
Samples: 500 - current=0.8 MB  peak=0.8 MB
```
…ry usage

```
Samples: 200000 - current=318.4 MB  peak=318.4 MB
Samples: 100000 - current=159.2 MB  peak=159.2 MB
Samples: 50000 - current=79.6 MB  peak=79.6 MB
Samples: 1000 - current=1.6 MB  peak=1.6 MB
Samples: 500 - current=0.8 MB  peak=0.8 MB
```
```
Samples: 200000 - current=318.4 MB  peak=318.4 MB
Samples: 100000 - current=159.2 MB  peak=159.2 MB
Samples: 50000 - current=79.6 MB  peak=79.6 MB
Samples: 1000 - current=1.6 MB  peak=1.6 MB
Samples: 500 - current=0.8 MB  peak=0.8 MB
```
```
Samples: 200000 - current=156.8 MB  peak=156.8 MB
Samples: 100000 - current=78.4 MB  peak=78.4 MB
Samples: 50000 - current=39.2 MB  peak=39.2 MB
Samples: 1000 - current=0.8 MB  peak=0.8 MB
Samples: 500 - current=0.4 MB  peak=0.4 MB
```
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 14, 2026

Merging this PR will not alter performance

✅ 20 untouched benchmarks


Comparing ipeluffo:perf/improve-memory-dependencies (95cd016) with master (6f9a102)1

Open in CodSpeed

Footnotes

  1. No successful run was found on master (9653034) during the generation of this report, so 6f9a102 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Comment on lines +21 to +25
unwrapped = inspect.unwrap(cast(Callable[..., Any], _impartial(call)))
return unwrapped


def _impartial(func: Callable[..., Any]) -> Callable[..., Any]:
def _impartial(func: Callable[..., Any] | None) -> Callable[..., Any] | None:
Copy link
Copy Markdown
Author

@ipeluffo ipeluffo Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] This is unrelated to my PR but mypy was complaining in the pre-commit check. The change looks safe and sensible

@ipeluffo ipeluffo marked this pull request as draft April 14, 2026 11:10
@ipeluffo ipeluffo marked this pull request as ready for review April 14, 2026 11:12
@gryevns
Copy link
Copy Markdown

gryevns commented Apr 28, 2026

@YuriiMotov is there anything needed to move this one forward and get some feedback/review? Thanks

@ipeluffo
Copy link
Copy Markdown
Author

@tiangolo , I'd be great if we can some feedback on this before the PR gets lost. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants