Skip to content

gh-150494: Sampling mode for tracemalloc#151935

Open
danielsn wants to merge 1 commit into
python:mainfrom
danielsn:dsn/sample-tracemalloc
Open

gh-150494: Sampling mode for tracemalloc#151935
danielsn wants to merge 1 commit into
python:mainfrom
danielsn:dsn/sample-tracemalloc

Conversation

@danielsn

@danielsn danielsn commented Jun 22, 2026

Copy link
Copy Markdown

Tracemalloc tracks every allocation, which can be useful for debugging purposes, but imposes a high cost both in CPU (to collect the stacktrace for every allocation) and Memory (to store tracking metadata for every live object). In many cases, this overhead is unnecessary, and a statistical sample would be sufficient to explain both high memory consumption, as well as memory leaks.

This PR adds a poisson sampling mode to tracemalloc. In the common case, allocations are not sampled, which means the CPU cost of tracemalloc is just an increment and a comparison, while the additional memory cost would be 0. In cases where sampling does occur, the cost is the same as before.

Running pyperformance --fast gives the following performance results:

Runtime overhead vs baseline

Mode Matched benchmarks Geomean Median P90 Min Max
Exact tracemalloc 97 3.33x 3.36x 8.68x 0.99x 23.30x
Sampled tracemalloc 97 1.23x 1.19x 1.51x 1.00x 2.13x

Peak RSS overhead vs baseline

Mode Matched benchmarks Geomean Median P90 Min Max
Exact tracemalloc 97 1.52x 1.51x 1.83x 1.13x 2.58x
Sampled tracemalloc 97 1.02x 1.02x 1.03x 0.86x 1.10x

Absolute peak RSS delta vs baseline

Mode Mean delta Median delta P90 delta Min delta Max delta
Exact tracemalloc +13.95 MiB +6.47 MiB +27.62 MiB +2.05 MiB +106.23 MiB
Sampled tracemalloc +0.31 MiB +0.38 MiB +0.56 MiB -4.22 MiB +3.05 MiB

Prior art

Go's heap profiler uses Horvitz-Thompson weighting for sampled allocations: an allocation of size S is sampled with probability p = 1 - exp(-S / rate), and the sample is credited with S / p. This
keeps per-allocation-site estimates unbiased even when allocation sizes are mixed.

TCMalloc uses a closely related scheme: sampled allocations are weighted based on the sampled allocation's own size and the sampling interval/overshoot, rather than simply crediting all bytes accumulated since
the previous sample to the allocation that crossed the threshold.

Thanks to @wincent for reviewing a previous draft of this code and proposing the use of Horvitz-Thompson weighting.

@read-the-docs-community

Copy link
Copy Markdown

Documentation build overview

📚 cpython-previews | 🛠️ Build #33256425 | 📁 Comparing 3d04b1f against main (27148d0)

  🔍 Preview build  

4 files changed
± c-api/init_config.html
± library/tracemalloc.html
± using/cmdline.html
± whatsnew/changelog.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant