[None][test] Update K2.5 andGLM-5 into CI Perf Test#14960
Conversation
📝 WalkthroughWalkthroughThis PR extends TensorRT-LLM's multi-GPU performance sanity testing infrastructure by introducing GLM-5-fp4 model support across B200, GB200, and GB300 hardware platforms. It updates Jenkins pipeline orchestration logic, integration test definitions, and provides comprehensive benchmark reference configurations for both aggregated and disaggregated testing modes. ChangesGLM-5-fp4 Multi-GPU Performance Testing
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
tests/scripts/perf-sanity/aggregated/glm5_fp4_grace_blackwell.yaml (1)
1-75: Coverage status: sufficient in-PR; one follow-up is outside this layer.For
tests/scripts/perf-sanity/aggregated/glm5_fp4_grace_blackwell.yaml, coverage is sufficient for GB200 aggregated 1k1k TEP/DEP variants.
Follow-up outside this PR layer (if not already handled in other stacked files): confirm CI selection references include all three new aggregated configs:
tests/scripts/perf-sanity/aggregated/glm5_fp4_2_nodes_grace_blackwell.yamltests/scripts/perf-sanity/aggregated/glm5_fp4_blackwell.yamltests/scripts/perf-sanity/aggregated/glm5_fp4_grace_blackwell.yaml🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/scripts/perf-sanity/aggregated/glm5_fp4_grace_blackwell.yaml` around lines 1 - 75, The CI selection references need to include all three new aggregated config files so tests run for each variant; update whatever CI/selection list references (e.g., in the perf-sanity CI matrix or selection files) to add "tests/scripts/perf-sanity/aggregated/glm5_fp4_grace_blackwell.yaml", "tests/scripts/perf-sanity/aggregated/glm5_fp4_blackwell.yaml", and "tests/scripts/perf-sanity/aggregated/glm5_fp4_2_nodes_grace_blackwell.yaml" so the new GB200 aggregated 1k1k TEP/DEP variants are selected by CI. Ensure any selection logic that filters by the directory tests/scripts/perf-sanity/aggregated or by model_name "glm_5_nvfp4" also accounts for these three files.tests/scripts/perf-sanity/aggregated/glm5_fp4_2_nodes_grace_blackwell.yaml (1)
1-75: Coverage status: sufficient for this file’s aggregated scope.For
tests/scripts/perf-sanity/aggregated/glm5_fp4_2_nodes_grace_blackwell.yaml, coverage is sufficient in this PR for GB200 aggregated perf sanity because both TEP (glm5_fp4_tep8_mtp3_8k1k) and DEP (glm5_fp4_dep8_mtp1_8k1k) variants are included with matching client workloads.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/scripts/perf-sanity/aggregated/glm5_fp4_2_nodes_grace_blackwell.yaml` around lines 1 - 75, The YAML already includes both TEP and DEP aggregated configs, but update/verify that metadata.model_name ("glm_5_nvfp4") matches each server_config.model_name and that the two server_configs named "glm5_fp4_tep8_mtp3_8k1k" and "glm5_fp4_dep8_mtp1_8k1k" remain present; also replace the placeholder dataset_file in each client_configs entry with the actual dataset path (or a CI-provided variable) so the perf-sanity jobs can run end-to-end.tests/scripts/perf-sanity/aggregated/glm5_fp4_blackwell.yaml (1)
1-75: Coverage status: sufficient for this file’s aggregated scope.For
tests/scripts/perf-sanity/aggregated/glm5_fp4_blackwell.yaml, coverage is sufficient in this PR for B200 aggregated perf sanity via both TEP and DEP 8k1k configurations.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/scripts/perf-sanity/aggregated/glm5_fp4_blackwell.yaml` around lines 1 - 75, Coverage for glm5_fp4_blackwell.yaml is already sufficient so no structural changes are required; however ensure the client_configs dataset_file placeholder is wired to the test runner by replacing the literal "<dataset_file>" with the CI/test variable your harness expects (e.g., ${DATASET_FILE}) so the two server configs named "glm5_fp4_tep8_mtp3_8k1k" and "glm5_fp4_dep8_mtp1_8k1k" (and keys metadata.model_name and supported_gpus) run with a real dataset path during execution.tests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_1k1k_con1_ctx1_dep2_gen1_tep4_eplb0_mtp3_ccb-NIXL.yaml (1)
1-94: QA coverage status: sufficient for config-definition scope; execution evidence needs follow-up outside this PR.Coverage is sufficient across the new GB300 disaggregated config set for this cohort:
tests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_1k1k_con1_ctx1_dep2_gen1_tep4_eplb0_mtp3_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_1k1k_con4096_ctx1_dep2_gen1_dep8_eplb256_mtp1_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_1k1k_con512_ctx1_dep2_gen1_dep32_eplb0_mtp3_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_8k1k_con1024_ctx1_dep2_gen1_dep8_eplb256_mtp1_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_8k1k_con1_ctx1_dep2_gen1_tep8_eplb0_mtp3_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_8k1k_con512_ctx1_dep2_gen1_dep32_eplb0_mtp3_ccb-NIXL.yamlActionable follow-up outside this PR: capture CI artifact evidence that placeholder fields (
<partition>,<account>,<dataset_file>,<model_path>) are fully resolved at runtime for each listed file.As per coding guidelines, "Act as a QA engineer reviewing test changes and coverage for TensorRT-LLM. Keep feedback actionable: suggest concrete list file names and whether coverage is sufficient, insufficient, or needs follow-up outside the PR."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_1k1k_con1_ctx1_dep2_gen1_tep4_eplb0_mtp3_ccb-NIXL.yaml` around lines 1 - 94, The YAML contains unresolved placeholders (<partition>, <account>, <dataset_file>, <model_path>) that must be validated before job submission; update the test harness or the config generation step to replace those placeholders for the files (e.g., tests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_1k1k_con1_ctx1_dep2_gen1_tep4_eplb0_mtp3_ccb-NIXL.yaml and the other listed YAMLs) and add a preflight check that reads keys partition, account, dataset_file, model_path and fails early with a clear error if any still match the placeholder pattern; alternatively wire them to concrete CI variables or templating logic so create_job()/load_config() (or whatever config loader function you use) performs substitution and validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/scripts/perf-sanity/aggregated/glm5_fp4_2_nodes_grace_blackwell.yaml`:
- Around line 1-75: The YAML already includes both TEP and DEP aggregated
configs, but update/verify that metadata.model_name ("glm_5_nvfp4") matches each
server_config.model_name and that the two server_configs named
"glm5_fp4_tep8_mtp3_8k1k" and "glm5_fp4_dep8_mtp1_8k1k" remain present; also
replace the placeholder dataset_file in each client_configs entry with the
actual dataset path (or a CI-provided variable) so the perf-sanity jobs can run
end-to-end.
In `@tests/scripts/perf-sanity/aggregated/glm5_fp4_blackwell.yaml`:
- Around line 1-75: Coverage for glm5_fp4_blackwell.yaml is already sufficient
so no structural changes are required; however ensure the client_configs
dataset_file placeholder is wired to the test runner by replacing the literal
"<dataset_file>" with the CI/test variable your harness expects (e.g.,
${DATASET_FILE}) so the two server configs named "glm5_fp4_tep8_mtp3_8k1k" and
"glm5_fp4_dep8_mtp1_8k1k" (and keys metadata.model_name and supported_gpus) run
with a real dataset path during execution.
In `@tests/scripts/perf-sanity/aggregated/glm5_fp4_grace_blackwell.yaml`:
- Around line 1-75: The CI selection references need to include all three new
aggregated config files so tests run for each variant; update whatever
CI/selection list references (e.g., in the perf-sanity CI matrix or selection
files) to add
"tests/scripts/perf-sanity/aggregated/glm5_fp4_grace_blackwell.yaml",
"tests/scripts/perf-sanity/aggregated/glm5_fp4_blackwell.yaml", and
"tests/scripts/perf-sanity/aggregated/glm5_fp4_2_nodes_grace_blackwell.yaml" so
the new GB200 aggregated 1k1k TEP/DEP variants are selected by CI. Ensure any
selection logic that filters by the directory
tests/scripts/perf-sanity/aggregated or by model_name "glm_5_nvfp4" also
accounts for these three files.
In
`@tests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_1k1k_con1_ctx1_dep2_gen1_tep4_eplb0_mtp3_ccb-NIXL.yaml`:
- Around line 1-94: The YAML contains unresolved placeholders (<partition>,
<account>, <dataset_file>, <model_path>) that must be validated before job
submission; update the test harness or the config generation step to replace
those placeholders for the files (e.g.,
tests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_1k1k_con1_ctx1_dep2_gen1_tep4_eplb0_mtp3_ccb-NIXL.yaml
and the other listed YAMLs) and add a preflight check that reads keys partition,
account, dataset_file, model_path and fails early with a clear error if any
still match the placeholder pattern; alternatively wire them to concrete CI
variables or templating logic so create_job()/load_config() (or whatever config
loader function you use) performs substitution and validation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8f8c1c3a-d00e-4a02-983b-8504f53ec70c
📒 Files selected for processing (28)
jenkins/L0_MergeRequest.groovyjenkins/L0_Test.groovytests/integration/test_lists/test-db/l0_b200_multi_gpus_perf_sanity.ymltests/integration/test_lists/test-db/l0_gb200_multi_gpus_perf_sanity.ymltests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node1_gpu4.ymltests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node2_gpu8.ymltests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node8_gpu32.ymltests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_node2_gpu8.ymltests/integration/test_lists/test-db/l0_gb300_multi_gpus_perf_sanity.ymltests/integration/test_lists/test-db/l0_gb300_multi_nodes_perf_sanity_ctx1_node1_gpu2_gen1_node1_gpu4.ymltests/integration/test_lists/test-db/l0_gb300_multi_nodes_perf_sanity_ctx1_node1_gpu2_gen1_node2_gpu8.ymltests/integration/test_lists/test-db/l0_gb300_multi_nodes_perf_sanity_ctx1_node1_gpu2_gen1_node8_gpu32.ymltests/integration/test_lists/test-db/l0_gb300_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node1_gpu4.ymltests/integration/test_lists/test-db/l0_gb300_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node8_gpu32.ymltests/scripts/perf-sanity/aggregated/glm5_fp4_2_nodes_grace_blackwell.yamltests/scripts/perf-sanity/aggregated/glm5_fp4_blackwell.yamltests/scripts/perf-sanity/aggregated/glm5_fp4_grace_blackwell.yamltests/scripts/perf-sanity/disaggregated/gb200_glm-5-fp4_1k1k_con1_ctx1_dep4_gen1_tep4_eplb0_mtp3_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb200_glm-5-fp4_1k1k_con4096_ctx1_dep4_gen1_dep8_eplb256_mtp1_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb200_glm-5-fp4_1k1k_con512_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb200_glm-5-fp4_8k1k_con1024_ctx1_dep4_gen1_dep8_eplb256_mtp1_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb200_glm-5-fp4_8k1k_con512_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_1k1k_con1_ctx1_dep2_gen1_tep4_eplb0_mtp3_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_1k1k_con4096_ctx1_dep2_gen1_dep8_eplb256_mtp1_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_1k1k_con512_ctx1_dep2_gen1_dep32_eplb0_mtp3_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_8k1k_con1024_ctx1_dep2_gen1_dep8_eplb256_mtp1_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_8k1k_con1_ctx1_dep2_gen1_tep8_eplb0_mtp3_ccb-NIXL.yamltests/scripts/perf-sanity/disaggregated/gb300_glm-5-fp4_8k1k_con512_ctx1_dep2_gen1_dep32_eplb0_mtp3_ccb-NIXL.yaml
💤 Files with no reviewable changes (2)
- tests/integration/test_lists/test-db/l0_gb300_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node1_gpu4.yml
- tests/integration/test_lists/test-db/l0_gb300_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node8_gpu32.yml
|
/bot run --disable-fail-fast --stage-list "GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-4,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-5,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-6,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-2,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-4,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-5,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-6,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-7,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-8,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-9,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-10,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-11,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-12,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-13,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-1,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-2,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-3,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-4,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-5,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-6,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-7,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-8,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-9,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-10,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-11,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-12,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-13,GB300-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE1-GPU4-Post-Merge-1,GB300-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE2-GPU8-Post-Merge-1,GB300-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE2-GPU8-Post-Merge-2,GB300-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE2-GPU8-Post-Merge-3,GB300-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE2-GPU8-Post-Merge-4,GB300-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE2-GPU8-Post-Merge-5,GB300-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE8-GPU32-Post-Merge-1,GB300-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE8-GPU32-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-4,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-5,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-6,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-8,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-9" |
|
PR_Github #52219 [ run ] triggered by Bot. Commit: |
|
PR_Github #52292 [ run ] triggered by Bot. Commit: |
|
PR_Github #52219 [ run ] completed with state |
|
PR_Github #52292 [ run ] completed with state
|
|
/bot run --disable-fail-fast --stage-list "GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-4,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-5,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-6,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-2,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-4,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-5,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-6,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-7,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-8,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-9,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-10,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-11,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-12,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-13,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-1,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-2,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-3,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-4,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-5,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-6,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-7,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-8,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-9,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-10,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-11,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-12,GB200-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE8-GPU32-Post-Merge-13,GB300-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE1-GPU4-Post-Merge-1,GB300-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE2-GPU8-Post-Merge-1,GB300-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE2-GPU8-Post-Merge-2,GB300-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE2-GPU8-Post-Merge-3,GB300-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE2-GPU8-Post-Merge-4,GB300-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE2-GPU8-Post-Merge-5,GB300-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE8-GPU32-Post-Merge-1,GB300-36_GPUs-9_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU2-GEN1-NODE8-GPU32-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-4,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-5,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-6,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-8,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-9" |
|
PR_Github #52308 [ run ] triggered by Bot. Commit: |
|
PR_Github #52308 [ run ] completed with state
|
Summary by CodeRabbit
Tests
Chores
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.