Skip to content

[3.15] gh-149584: Fix excessive overhead in the Tachyon profiler regarding the cache behavior (GH-149649)#150152

Merged
pablogsal merged 1 commit into
python:3.15from
miss-islington:backport-661df25-3.15
May 20, 2026
Merged

[3.15] gh-149584: Fix excessive overhead in the Tachyon profiler regarding the cache behavior (GH-149649)#150152
pablogsal merged 1 commit into
python:3.15from
miss-islington:backport-661df25-3.15

Conversation

@miss-islington
Copy link
Copy Markdown
Contributor

@miss-islington miss-islington commented May 20, 2026

Use exact remote reads for interpreter state, thread state, and
interpreter frame structs instead of pulling full remote pages into the
profiler page cache. This matches the core change from
#149585.

The profiler clears the page cache between samples, so live entries are
always packed at the front. Track the live count and only clear/search
that prefix instead of scanning all 1024 slots on the hot path.

Use the frame cache to predict the next thread state and top frame
address, then batch interpreter/thread/frame reads with process_vm_readv
when profiling a Linux target. Reuse prefetched frame buffers in the
frame walker when the prediction is valid.

Cache the last FrameInfo tuple per code object/instruction offset, reuse
cached thread id objects, and append cached parent frames directly on
full frame-cache hits. This cuts Python allocation churn in the
steady-state profiler path.
(cherry picked from commit 661df25)

Co-authored-by: Pablo Galindo Salgado Pablogsal@gmail.com

…ding the cache behavior (pythonGH-149649)

Use exact remote reads for interpreter state, thread state, and
interpreter frame structs instead of pulling full remote pages into the
profiler page cache. This matches the core change from
python#149585.

The profiler clears the page cache between samples, so live entries are
always packed at the front. Track the live count and only clear/search
that prefix instead of scanning all 1024 slots on the hot path.

Use the frame cache to predict the next thread state and top frame
address, then batch interpreter/thread/frame reads with process_vm_readv
when profiling a Linux target. Reuse prefetched frame buffers in the
frame walker when the prediction is valid.

Cache the last FrameInfo tuple per code object/instruction offset, reuse
cached thread id objects, and append cached parent frames directly on
full frame-cache hits. This cuts Python allocation churn in the
steady-state profiler path.
(cherry picked from commit 661df25)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
@pablogsal pablogsal enabled auto-merge (squash) May 20, 2026 11:33
@pablogsal pablogsal merged commit 034c536 into python:3.15 May 20, 2026
60 checks passed
@miss-islington miss-islington deleted the backport-661df25-3.15 branch May 20, 2026 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants