Skip to content

Skip expensive debug_lines computation in AOT autograd cache#179733

Closed
frgossen wants to merge 1 commit intogh/frgossen/15/basefrom
gh/frgossen/15/head
Closed

Skip expensive debug_lines computation in AOT autograd cache#179733
frgossen wants to merge 1 commit intogh/frgossen/15/basefrom
gh/frgossen/15/head

Conversation

@frgossen
Copy link
Copy Markdown
Contributor

@frgossen frgossen commented Apr 8, 2026

Stack from ghstack (oldest at bottom):

FxGraphCachePickler.debug_lines re-hashes every attribute of the cache
details object individually. This runs unconditionally even when debug
logging is disabled.

Gate the computation behind log.isEnabledFor(logging.DEBUG) so the
cost is only paid when someone is actively debugging cache key
differences.

On a vLLM Meta-Llama-3-70B-Instruct TP=4 benchmark, this reduces
cold compile time from 30.50 ± 0.50 s to 29.40 ± 0.90 s (1.04x)
and cache lookup time from 11.35 ± 0.45 ms to 6.25 ± 0.30 ms
(1.82x).

Authored with Claude.

cc @oulgen @jamesjwu @aorenste @anijain2305 @laithsakka @penguinwu @masnesral @coconutruben @aditvenk

FxGraphCachePickler.debug_lines re-hashes every attribute of the cache
details object individually. This runs unconditionally even when debug
logging is disabled.

Gate the computation behind log.isEnabledFor(logging.DEBUG) so the
cost is only paid when someone is actively debugging cache key
differences.

On a vLLM Meta-Llama-3-70B-Instruct TP=4 benchmark, this reduces
cold compile time from 30.50 ± 0.50 s to 29.40 ± 0.90 s (1.04x)
and cache lookup time from 11.35 ± 0.45 ms to 6.25 ± 0.30 ms
(1.82x).

Authored with Claude.

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 8, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/179733

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit d1011f0 with merge base acdb423 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 8, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@frgossen frgossen marked this pull request as draft April 8, 2026 18:36
@frgossen frgossen added module: compile-time Compilation mechanism or time spent in (re)compilation, tracing, startup release notes: aot autograd release notes category topic: performance topic category labels Apr 8, 2026
@frgossen frgossen requested a review from zou3519 April 8, 2026 19:16
@frgossen frgossen marked this pull request as ready for review April 8, 2026 20:17
@frgossen frgossen removed the request for review from bdhirsh April 8, 2026 20:17
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Starting merge as part of PR stack under #179910

@frgossen
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 10, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Apr 13, 2026
_get_dict is called from save_config_portable on every AOT autograd
cache key computation.

1. It called copy.deepcopy on every config value, but the vast
   majority are immutable types (bool, int, str, None) that don't
   need copying. Now only list/set/dict values are deep-copied.

2. It went through __getattr__ for every value, which includes
   deprecation warning checks, alias resolution, and other overhead.
   Now reads values directly from config entries.

On a vLLM Meta-Llama-3-70B-Instruct TP=4 benchmark, this reduces
cold compile time from 29.40 ± 0.90 s to 28.50 ± 0.40 s (1.03x)
and cache lookup time from 6.25 ± 0.30 ms to 5.20 ± 0.45 ms
(1.20x).

Authored with Claude.

Pull Request resolved: #179734
Approved by: https://github.com/aorenste
ghstack dependencies: #179733
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: compile-time Compilation mechanism or time spent in (re)compilation, tracing, startup release notes: aot autograd release notes category topic: performance topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants