Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/179756
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 10 PendingAs of commit e70f946 with merge base 13d0e53 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Attention! native_functions.yaml was changedIf you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info. Caused by: |
Attention! PyTorch one of the C-stable API file was changedYou MUST NOT change existing function declarations in this, as this header defines a stable C ABI. If you need to change the signature for a function, introduce a new v2 version of the function and modify code generation to target the new version of the function. Caused by: |
|
I suppose it's worth mentioning that #179388 was submitted to implement this already. But I'm guessing that this one might give better performance since it can calculate both grads in the same kernel |
I waited for the author to fix numeric for some time (even in their original #159421 issue), but it never happened |
|
|
||
| // Bilinear backward kernel | ||
| template <typename Pad, typename T> | ||
| kernel void grid_sampler_2d_backward_bilinear( |
There was a problem hiding this comment.
It would be nice to combine the bilinear, bicubic, and nearest kernels into one, with a function pointer template parameter to switch between them, since there are a lot of identical lines of code in the three kernels. But maybe it's not worth the trouble, if the function pointer would need a huge list of arguments
There was a problem hiding this comment.
Have you tested which one is more performant? I know you pass mode as an argument to grid_sampler_3d, but for some reason CUDA follows this pattern for both interpolate and 2d... Template argument do sound reasonable to me
There was a problem hiding this comment.
I can benchmark it as followup PR. This one is a literal transpiling of CUDA shader into Metal without significant architectural changes. But if selecting interpolation mode dynamically and statically (via kernel dispatch) have the same perf, it indeed stands to reason to use it
There was a problem hiding this comment.
I haven't tested the performance. I had assumed that the compiler would make them the same, but I agree that there is a chance that it doesn't
Hi, thanks for opening this PR. My original PR #159421 did not have any numerical issues and all tests were passing. I removed the GridSampler2D backward from my current PR #179388 to avoid conflicts with this PR. |
|
@pytorchbot merge -f "Lint + MPS are green" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Implement the MPS backward pass for `grid_sampler_3d`, supporting bilinear and nearest interpolation across `float32`, `float16`, and `bfloat16` dtypes. The backward kernels use `atomic<float>` for grad_input accumulation (Metal 3 lacks `atomic<half>`/`atomic<bfloat>`), with automatic dtype conversion on output. Intermediate computations use `opmath_t<T>` to maintain precision for reduced-precision types. The forward kernel is restructured to match the 2D kernel pattern: one thread per spatial position (`N*D*H*W`) with a channel loop, eliminating redundant grid reads and coordinate transforms across channels. Nearest-neighbor interpolation is also added for the 3D forward pass. The `grid_sampler_2d` backward has been removed from this PR since #179756 implements it independently. This PR is designed to rebase cleanly on top of that change. This is a reactivation of my stale PR #159421. @Skylion007 @kurtamohler Pull Request resolved: #179388 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Stack from ghstack (oldest at bottom):
Implements
grid_sampler_2d_backward_mpsas Metal kernel, ported from the CUDA implementation in GridSampler.cu / GridSampler.cuh. Supports all combinations of interpolation and padding modes for float32, float16, and bfloat16. Has been requested more than 10 times in #141287Benchmark:
grid_sampler_2dbackward on Apple M5 ProBenchmark script
Notes:
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com