[None][infra] Reduce Docker image layer count in release stage by tburt-nv · Pull Request #14972 · NVIDIA/TensorRT-LLM

tburt-nv · 2026-06-04T19:12:20Z

Description

This PR is intended to supersede @janbernloehr's PR #11987, since it needs some minor tweaks.

Here is Jan's original summary:

Follow-up to #11720 which already consolidated the devel stage layers using bind mounts.

This PR applies the same pattern to the release stage:

Replace 7 COPY layers + 2 RUN layers with a single RUN using --mount=type=bind and cp, consolidating all file copies, symlink creation, and cleanup into one layer
Replace the OSS attribution COPY + RUN + rm with a single RUN using a bind mount
Reduces the release stage from ~13 layers to 3 (pip install, copy+setup+cleanup, OSS attribution), saving ~10 layers in the published image and leaving more headroom under Docker's 128-layer overlay2 limit for downstream consumers.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

Chores
- Updated container image configurations to optimize the build process and improve release pipeline efficiency.

Consolidate all COPY and RUN steps in the release stage into a single RUN using bind mounts and cp. This replaces 7 COPY layers and 2 RUN layers with a single RUN layer. Also use a bind mount for the OSS attribution script instead of COPY+RUN+rm. Reduces the release stage from ~13 layers to 3 (pip install, copy+setup, OSS attribution), saving ~10 layers in the published image. Signed-off-by: jbernloehr <jbernloehr@nvidia.com>

tburt-nv · 2026-06-04T19:12:43Z

/bot run --stage-list "Build-Docker-Images"

tensorrt-cicd · 2026-06-04T19:19:16Z

PR_Github #52147 [ run ] triggered by Bot. Commit: 61b711c Link to invocation

tensorrt-cicd · 2026-06-05T02:05:54Z

PR_Github #52147 [ run ] completed with state FAILURE. Commit: 61b711c
/LLM/main/L0_MergeRequest_PR pipeline #41469 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>

tburt-nv · 2026-06-05T16:01:53Z

/bot run

tensorrt-cicd · 2026-06-05T16:08:46Z

PR_Github #52374 [ run ] triggered by Bot. Commit: 4aefe77 Link to invocation

coderabbitai · 2026-06-05T16:09:44Z

📝 Walkthrough

Walkthrough

The PR optimizes the Docker release stage by replacing sequential COPY statements with BuildKit bind mounts for more efficient artifact assembly. It removes benchmark source files, validates library presence, creates package symlinks, and refactors OSS attribution generation. Jenkins CI image tag properties are updated to reference the newly built images.

Changes

Docker Build and CI Image Updates

Layer / File(s)	Summary
Release stage BuildKit bind mount optimization `docker/Dockerfile.multi`	Release stage consolidates README, docs, headers, examples, and wheel artifacts into a single bind-mount RUN operation instead of separate COPY statements. Removes benchmark sources, creates bin/lib symlinks, validates the TensorRT-LLM plugin library, updates loader config, and verifies executorWorker binary linking. OSS attribution script is mounted directly instead of copied.
Jenkins CI image tag updates `jenkins/current_image_tags.properties`	All five LLM-related Docker image tags are updated from build `202606012126-14025` to `202606051544-14972`, referencing the optimized release images.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the main change: consolidating Docker image layers in the release stage using bind mounts, which directly matches the primary objectives of the pull request.
Description check	✅ Passed	The PR description clearly explains the objective, references the superseded PR, includes Jan's original summary with technical details, and includes a completed checklist. However, the Test Coverage section is empty/incomplete.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docker/Dockerfile.multi`:
- Line 165: The ldd predicate can silently pass when there are no tensorrt_llm
lines; change the check to first assert the dependency exists and then assert it
is not "not found" — e.g. run ldd -v bin/executorWorker and test with grep -q
'tensorrt_llm' (fail if missing) and only then check that the matched lines do
NOT contain "not found" (e.g. ! grep tensorrt_llm | grep -q "not found");
reference the binary name bin/executorWorker and the dependency token
tensorrt_llm when updating the Dockerfile multi-line gate.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 14825cce-cb85-4f12-b17e-5176b75b12ae

📥 Commits

Reviewing files that changed from the base of the PR and between 910826b and 4aefe77.

📒 Files selected for processing (2)

docker/Dockerfile.multi
jenkins/current_image_tags.properties

coderabbitai · 2026-06-05T16:09:48Z

+    test -f lib/libnvinfer_plugin_tensorrt_llm.so && \
+    echo "/app/tensorrt_llm/lib" > /etc/ld.so.conf.d/tensorrt_llm.conf && \
+    ldconfig && \
+    ! ( ldd -v bin/executorWorker | grep tensorrt_llm | grep -q "not found" ) && \


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Demonstrate why the current predicate can pass when tensorrt_llm is absent. set -euo pipefail if ! ( printf 'libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6\n' | grep tensorrt_llm | grep -q "not found" ); then echo "Current check passes even with no tensorrt_llm line (false negative)." fi

Repository: NVIDIA/TensorRT-LLM

Length of output: 133

🏁 Script executed:

#!/bin/bash set -euo pipefail nl -ba docker/Dockerfile.multi | sed -n '145,190p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 106

🏁 Script executed:

#!/bin/bash set -euo pipefail awk 'NR>=145 && NR<=190 { printf "%d:%s\n", NR, $0 }' docker/Dockerfile.multi

Repository: NVIDIA/TensorRT-LLM

Length of output: 2347

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "SHELL|pipefail" docker/Dockerfile.multi || true

Repository: NVIDIA/TensorRT-LLM

Length of output: 92

Strengthen the ldd gate to fail when executorWorker has no tensorrt_llm dependency lines. (docker/Dockerfile.multi:165)

The current predicate can pass when ldd output contains no tensorrt_llm matches at all, since grep tensorrt_llm producing no input also makes grep -q "not found" return non-match, which is then inverted by !.

Suggested change

- ! ( ldd -v bin/executorWorker | grep tensorrt_llm | grep -q "not found" ) && \ + ldd -v bin/executorWorker | grep -q "tensorrt_llm" && \ + ! ( ldd -v bin/executorWorker | grep "tensorrt_llm" | grep -q "not found" ) && \

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

! ( ldd -v bin/executorWorker | grep tensorrt_llm | grep -q "not found" ) && \

ldd -v bin/executorWorker | grep -q "tensorrt_llm" && \

! ( ldd -v bin/executorWorker | grep "tensorrt_llm" | grep -q "not found" ) && \

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docker/Dockerfile.multi` at line 165, The ldd predicate can silently pass when there are no tensorrt_llm lines; change the check to first assert the dependency exists and then assert it is not "not found" — e.g. run ldd -v bin/executorWorker and test with grep -q 'tensorrt_llm' (fail if missing) and only then check that the matched lines do NOT contain "not found" (e.g. ! grep tensorrt_llm | grep -q "not found"); reference the binary name bin/executorWorker and the dependency token tensorrt_llm when updating the Dockerfile multi-line gate.

tburt-nv · 2026-06-05T16:53:39Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-05T16:56:08Z

PR_Github #52374 [ run ] completed with state SUCCESS. Commit: 4aefe77
/LLM/main/L0_MergeRequest_PR pipeline #41672 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

tensorrt-cicd · 2026-06-05T17:00:04Z

PR_Github #52388 [ run ] triggered by Bot. Commit: 4aefe77 Link to invocation

tensorrt-cicd · 2026-06-05T17:01:20Z

PR_Github #52388 [ run ] completed with state FAILURE. Commit: 4aefe77

Link to invocation

tburt-nv · 2026-06-05T19:31:27Z

/bot run --disable-fail-fast

tburt-nv requested review from a team as code owners June 4, 2026 19:12

tburt-nv requested review from yiqingy0 and zeroepoch June 4, 2026 19:12

github-actions Bot assigned tburt-nv Jun 4, 2026

zeroepoch approved these changes Jun 5, 2026

View reviewed changes

update CI images

4aefe77

Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>

tburt-nv requested a review from a team as a code owner June 5, 2026 16:01

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

	! ( ldd -v bin/executorWorker \| grep tensorrt_llm \| grep -q "not found" ) && \
	ldd -v bin/executorWorker \| grep -q "tensorrt_llm" && \
	! ( ldd -v bin/executorWorker \| grep "tensorrt_llm" \| grep -q "not found" ) && \

Conversation

tburt-nv commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Summary by CodeRabbit

Uh oh!

tburt-nv commented Jun 4, 2026

Uh oh!

tensorrt-cicd commented Jun 4, 2026

Uh oh!

tensorrt-cicd commented Jun 5, 2026

Uh oh!

tburt-nv commented Jun 5, 2026

Uh oh!

tensorrt-cicd commented Jun 5, 2026

Uh oh!

coderabbitai Bot commented Jun 5, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

tburt-nv commented Jun 5, 2026

Uh oh!

tensorrt-cicd commented Jun 5, 2026

Uh oh!

tensorrt-cicd commented Jun 5, 2026

Uh oh!

tensorrt-cicd commented Jun 5, 2026

Uh oh!

tburt-nv commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tburt-nv commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading