Speed up ConfigModule._get_dict by avoiding unnecessary work#179734
Speed up ConfigModule._get_dict by avoiding unnecessary work#179734frgossen wants to merge 1 commit intogh/frgossen/16/basefrom
Conversation
_get_dict is called from save_config_portable on every AOT autograd cache key computation. 1. It called copy.deepcopy on every config value, but the vast majority are immutable types (bool, int, str, None) that don't need copying. Now only list/set/dict values are deep-copied. 2. It went through __getattr__ for every value, which includes deprecation warning checks, alias resolution, and other overhead. Now reads values directly from config entries. On a vLLM Meta-Llama-3-70B-Instruct TP=4 benchmark, this reduces cold compile time from 29.40 ± 0.90 s to 28.50 ± 0.40 s (1.03x) and cache lookup time from 6.25 ± 0.30 ms to 5.20 ± 0.45 ms (1.20x). Authored with Claude. [ghstack-poisoned]
This PR needs a
|
_get_dict is called from save_config_portable on every AOT autograd cache key computation. 1. It called copy.deepcopy on every config value, but the vast majority are immutable types (bool, int, str, None) that don't need copying. Now only list/set/dict values are deep-copied. 2. It went through __getattr__ for every value, which includes deprecation warning checks, alias resolution, and other overhead. Now reads values directly from config entries. On a vLLM Meta-Llama-3-70B-Instruct TP=4 benchmark, this reduces cold compile time from 29.40 ± 0.90 s to 28.50 ± 0.40 s (1.03x) and cache lookup time from 6.25 ± 0.30 ms to 5.20 ± 0.45 ms (1.20x). Authored with Claude. ghstack-source-id: 936470c Pull Request resolved: #179734
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/179734
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 751e8e4 with merge base acdb423 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
Starting merge as part of PR stack under #179910 |
|
Starting merge as part of PR stack under #179910 |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
_get_dict is called from save_config_portable on every AOT autograd
cache key computation.
It called copy.deepcopy on every config value, but the vast
majority are immutable types (bool, int, str, None) that don't
need copying. Now only list/set/dict values are deep-copied.
It went through getattr for every value, which includes
deprecation warning checks, alias resolution, and other overhead.
Now reads values directly from config entries.
On a vLLM Meta-Llama-3-70B-Instruct TP=4 benchmark, this reduces
cold compile time from 29.40 ± 0.90 s to 28.50 ± 0.40 s (1.03x)
and cache lookup time from 6.25 ± 0.30 ms to 5.20 ± 0.45 ms
(1.20x).
Authored with Claude.
cc @oulgen @jamesjwu @aorenste @anijain2305 @laithsakka @penguinwu @masnesral @coconutruben @aditvenk