Skip to content

Triton Backend Snapshot: Graph Fusion, NVRTC/PTX Dispatch, and Kernel Correctness#10420

Open
agibsonccc wants to merge 1 commit intomasterfrom
pr/triton-backend-kernels
Open

Triton Backend Snapshot: Graph Fusion, NVRTC/PTX Dispatch, and Kernel Correctness#10420
agibsonccc wants to merge 1 commit intomasterfrom
pr/triton-backend-kernels

Conversation

@agibsonccc
Copy link
Copy Markdown
Contributor

Summary

Brings in Triton/NVRTC/PTX graph backend and kernel-surface updates, including op category mapping, fusion path fixes, and backend dispatch correctness improvements for graph execution.

Source snapshot commit: b5893454f08c91e7bafb47478757c84a5f1cab04 from ag_new_release_updates_2.

Included Files

  • libnd4j/include/graph/FusionPass.h
  • libnd4j/include/graph/GraphBackend.h
  • libnd4j/include/graph/cpu/AclGraphBackend.cpp
  • libnd4j/include/graph/cpu/AclGraphBackend.h
  • libnd4j/include/graph/cpu/OneDnnGraphBackend.cpp
  • libnd4j/include/graph/cpu/OneDnnGraphBackend.h
  • libnd4j/include/graph/gpu/GpuKernelLauncher.cu
  • libnd4j/include/graph/gpu/GpuKernelLauncher.h
  • libnd4j/include/graph/gpu/NvrtcGraphBackend.cpp
  • libnd4j/include/graph/gpu/NvrtcGraphBackend.h
  • libnd4j/include/graph/gpu/NvrtcKernelBuilder.cpp
  • libnd4j/include/graph/gpu/NvrtcKernelBuilder.h
  • libnd4j/include/graph/gpu/NvrtcKernelCache.cu
  • libnd4j/include/graph/gpu/NvrtcKernelCache.h
  • libnd4j/include/graph/gpu/OpCategoryTable.h
  • libnd4j/include/graph/gpu/PtxGraphBackend.cpp
  • libnd4j/include/graph/gpu/PtxGraphBackend.h
  • libnd4j/include/graph/gpu/TritonGraphBackend.cpp
  • libnd4j/include/graph/gpu/TritonGraphBackend.h
  • libnd4j/include/graph/gpu/TritonIRBuilder.cpp
  • libnd4j/include/graph/gpu/TritonIRBuilder.h
  • libnd4j/include/graph/gpu/TritonTargetDispatch.cpp
  • libnd4j/include/graph/gpu/TritonTargetDispatch.h
  • libnd4j/include/graph/impl/Context.cpp
  • libnd4j/include/graph/impl/FusionPass.cpp
  • libnd4j/include/ops/declarable/generic/nn/dora_matmul.cpp
  • libnd4j/include/ops/declarable/generic/nn/dot_product_attention.cpp
  • libnd4j/include/ops/declarable/generic/nn/fused_elementwise_chain.cpp
  • libnd4j/include/ops/declarable/generic/nn/llm_ops.cpp
  • libnd4j/include/ops/declarable/generic/nn/loha_matmul.cpp
  • libnd4j/include/ops/declarable/generic/nn/lokr_matmul.cpp
  • libnd4j/include/ops/declarable/generic/nn/lora_matmul.cpp
  • libnd4j/include/ops/declarable/generic/nn/multi_lora_matmul.cpp
  • libnd4j/include/ops/declarable/generic/nn/selective_scan.cpp
  • libnd4j/include/ops/declarable/generic/nn/smooth_quant.cpp
  • libnd4j/include/ops/declarable/headers/llm.h
  • libnd4j/include/ops/declarable/helpers/cpu/windowed_attention.cpp
  • libnd4j/include/ops/declarable/helpers/cuda/col2im.cu
  • libnd4j/include/ops/declarable/helpers/cuda/fusedElementwiseChain.cu
  • libnd4j/include/ops/declarable/helpers/cuda/windowed_attention.cu
  • libnd4j/include/ops/declarable/helpers/fusedElementwiseChain.h
  • libnd4j/include/ops/declarable/impl/DeclarableOp.cpp
  • libnd4j/include/system/op_boilerplate.h

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant