Skip to content

Speed up ConfigModule._get_dict by avoiding unnecessary work#179734

Closed
frgossen wants to merge 1 commit intogh/frgossen/16/basefrom
gh/frgossen/16/head
Closed

Speed up ConfigModule._get_dict by avoiding unnecessary work#179734
frgossen wants to merge 1 commit intogh/frgossen/16/basefrom
gh/frgossen/16/head

Conversation

@frgossen
Copy link
Copy Markdown
Contributor

@frgossen frgossen commented Apr 8, 2026

Stack from ghstack (oldest at bottom):

_get_dict is called from save_config_portable on every AOT autograd
cache key computation.

  1. It called copy.deepcopy on every config value, but the vast
    majority are immutable types (bool, int, str, None) that don't
    need copying. Now only list/set/dict values are deep-copied.

  2. It went through getattr for every value, which includes
    deprecation warning checks, alias resolution, and other overhead.
    Now reads values directly from config entries.

On a vLLM Meta-Llama-3-70B-Instruct TP=4 benchmark, this reduces
cold compile time from 29.40 ± 0.90 s to 28.50 ± 0.40 s (1.03x)
and cache lookup time from 6.25 ± 0.30 ms to 5.20 ± 0.45 ms
(1.20x).

Authored with Claude.

cc @oulgen @jamesjwu @aorenste @anijain2305 @laithsakka @penguinwu @masnesral @coconutruben @aditvenk

_get_dict is called from save_config_portable on every AOT autograd
cache key computation.

1. It called copy.deepcopy on every config value, but the vast
   majority are immutable types (bool, int, str, None) that don't
   need copying. Now only list/set/dict values are deep-copied.

2. It went through __getattr__ for every value, which includes
   deprecation warning checks, alias resolution, and other overhead.
   Now reads values directly from config entries.

On a vLLM Meta-Llama-3-70B-Instruct TP=4 benchmark, this reduces
cold compile time from 29.40 ± 0.90 s to 28.50 ± 0.40 s (1.03x)
and cache lookup time from 6.25 ± 0.30 ms to 5.20 ± 0.45 ms
(1.20x).

Authored with Claude.

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 8, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

frgossen added a commit that referenced this pull request Apr 8, 2026
_get_dict is called from save_config_portable on every AOT autograd
cache key computation.

1. It called copy.deepcopy on every config value, but the vast
   majority are immutable types (bool, int, str, None) that don't
   need copying. Now only list/set/dict values are deep-copied.

2. It went through __getattr__ for every value, which includes
   deprecation warning checks, alias resolution, and other overhead.
   Now reads values directly from config entries.

On a vLLM Meta-Llama-3-70B-Instruct TP=4 benchmark, this reduces
cold compile time from 29.40 ± 0.90 s to 28.50 ± 0.40 s (1.03x)
and cache lookup time from 6.25 ± 0.30 ms to 5.20 ± 0.45 ms
(1.20x).

Authored with Claude.

ghstack-source-id: 936470c
Pull Request resolved: #179734
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 8, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/179734

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 751e8e4 with merge base acdb423 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@frgossen frgossen marked this pull request as draft April 8, 2026 18:36
@frgossen frgossen added topic: not user facing topic category module: compile-time Compilation mechanism or time spent in (re)compilation, tracing, startup topic: performance topic category and removed topic: not user facing topic category labels Apr 8, 2026
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 8, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@frgossen frgossen added topic: not user facing topic category release notes: aot autograd release notes category and removed topic: not user facing topic category labels Apr 8, 2026
@frgossen frgossen requested review from aorenste and zou3519 and removed request for zou3519 April 8, 2026 19:10
@frgossen frgossen marked this pull request as ready for review April 8, 2026 19:14
Comment thread torch/utils/_config_module.py
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Starting merge as part of PR stack under #179910

@frgossen frgossen added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 13, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Starting merge as part of PR stack under #179910

@frgossen
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: compile-time Compilation mechanism or time spent in (re)compilation, tracing, startup release notes: aot autograd release notes category topic: performance topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants