Skip to content

[cuda graphs] Integrate kernel annotations into inductor cudagraph trees#179769

Draft
yushangdi wants to merge 8 commits intogh/yushangdi/25/basefrom
gh/yushangdi/25/head
Draft

[cuda graphs] Integrate kernel annotations into inductor cudagraph trees#179769
yushangdi wants to merge 8 commits intogh/yushangdi/25/basefrom
gh/yushangdi/25/head

Conversation

@yushangdi
Copy link
Copy Markdown
Contributor

@yushangdi yushangdi commented Apr 8, 2026

Stack from ghstack (oldest at bottom):

Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.

WIP: current progress: it works, but has bugs, the annotation is not accurate yet
Authored with Claude.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.

Authored with Claude.

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 8, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/179769

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 2 Unrelated Failures

As of commit 2b68f16 with merge base a4a69be (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

yushangdi added a commit that referenced this pull request Apr 8, 2026
Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.

Authored with Claude.

ghstack-source-id: 655bb94
Pull Request resolved: #179769
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 8, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@yushangdi yushangdi marked this pull request as draft April 8, 2026 23:32
…udagraph trees"

Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.

Authored with Claude.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
yushangdi added a commit that referenced this pull request Apr 9, 2026
Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.

Authored with Claude.

ghstack-source-id: b41e2ef
Pull Request resolved: #179769
…udagraph trees"


Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.


**WIP: current progress: it works, but has bugs, the annotation is not accurate yet**
Authored with Claude.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
yushangdi added a commit that referenced this pull request Apr 9, 2026
Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.

Authored with Claude.

ghstack-source-id: 337958e
Pull Request resolved: #179769
…udagraph trees"


Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.


**WIP: current progress: it works, but has bugs, the annotation is not accurate yet**
Authored with Claude.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
yushangdi added a commit that referenced this pull request Apr 9, 2026
Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.

Authored with Claude.

ghstack-source-id: b9228d7
Pull Request resolved: #179769
…udagraph trees"


Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.


**WIP: current progress: it works, but has bugs, the annotation is not accurate yet**
Authored with Claude.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
yushangdi added a commit that referenced this pull request Apr 9, 2026
Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.

Authored with Claude.

ghstack-source-id: 2432f63
Pull Request resolved: #179769
…udagraph trees"


Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.


**WIP: current progress: it works, but has bugs, the annotation is not accurate yet**
Authored with Claude.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
yushangdi added a commit that referenced this pull request Apr 9, 2026
Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.

Authored with Claude.

ghstack-source-id: d9e5db1
Pull Request resolved: #179769
…udagraph trees"


Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.


**WIP: current progress: it works, but has bugs, the annotation is not accurate yet**
Authored with Claude.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
yushangdi added a commit that referenced this pull request Apr 10, 2026
Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.

Authored with Claude.

ghstack-source-id: f4feccd
Pull Request resolved: #179769
…udagraph trees"


Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.


**WIP: current progress: it works, but has bugs, the annotation is not accurate yet**
Authored with Claude.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo

[ghstack-poisoned]
yushangdi added a commit that referenced this pull request Apr 10, 2026
Add config.triton.cudagraph_kernel_annotations (default False) and wire
up clear/resolve/remap calls in CUDAGraphNode._record() so that
mark_kernels() scopes firing during cudagraph capture are automatically
processed.

Authored with Claude.

ghstack-source-id: e377c4b
Pull Request resolved: #179769
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant