Skip to content

[None][infra] Reduce Docker image layer count in release stage#14972

Open
tburt-nv wants to merge 2 commits into
NVIDIA:mainfrom
tburt-nv:reduce-docker-layers
Open

[None][infra] Reduce Docker image layer count in release stage#14972
tburt-nv wants to merge 2 commits into
NVIDIA:mainfrom
tburt-nv:reduce-docker-layers

Conversation

@tburt-nv
Copy link
Copy Markdown
Collaborator

@tburt-nv tburt-nv commented Jun 4, 2026

Description

This PR is intended to supersede @janbernloehr's PR #11987, since it needs some minor tweaks.

Here is Jan's original summary:

Follow-up to #11720 which already consolidated the devel stage layers using bind mounts.

This PR applies the same pattern to the release stage:

Replace 7 COPY layers + 2 RUN layers with a single RUN using --mount=type=bind and cp, consolidating all file copies, symlink creation, and cleanup into one layer
Replace the OSS attribution COPY + RUN + rm with a single RUN using a bind mount
Reduces the release stage from ~13 layers to 3 (pip install, copy+setup+cleanup, OSS attribution), saving ~10 layers in the published image and leaving more headroom under Docker's 128-layer overlay2 limit for downstream consumers.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

  • Chores
    • Updated container image configurations to optimize the build process and improve release pipeline efficiency.

Consolidate all COPY and RUN steps in the release stage into a single
RUN using bind mounts and cp. This replaces 7 COPY layers and 2 RUN
layers with a single RUN layer. Also use a bind mount for the OSS
attribution script instead of COPY+RUN+rm.

Reduces the release stage from ~13 layers to 3 (pip install, copy+setup,
OSS attribution), saving ~10 layers in the published image.

Signed-off-by: jbernloehr <jbernloehr@nvidia.com>
@tburt-nv tburt-nv requested review from a team as code owners June 4, 2026 19:12
@tburt-nv tburt-nv requested review from yiqingy0 and zeroepoch June 4, 2026 19:12
@tburt-nv
Copy link
Copy Markdown
Collaborator Author

tburt-nv commented Jun 4, 2026

/bot run --stage-list "Build-Docker-Images"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52147 [ run ] triggered by Bot. Commit: 61b711c Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52147 [ run ] completed with state FAILURE. Commit: 61b711c
/LLM/main/L0_MergeRequest_PR pipeline #41469 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
@tburt-nv tburt-nv requested a review from a team as a code owner June 5, 2026 16:01
@tburt-nv
Copy link
Copy Markdown
Collaborator Author

tburt-nv commented Jun 5, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52374 [ run ] triggered by Bot. Commit: 4aefe77 Link to invocation

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

The PR optimizes the Docker release stage by replacing sequential COPY statements with BuildKit bind mounts for more efficient artifact assembly. It removes benchmark source files, validates library presence, creates package symlinks, and refactors OSS attribution generation. Jenkins CI image tag properties are updated to reference the newly built images.

Changes

Docker Build and CI Image Updates

Layer / File(s) Summary
Release stage BuildKit bind mount optimization
docker/Dockerfile.multi
Release stage consolidates README, docs, headers, examples, and wheel artifacts into a single bind-mount RUN operation instead of separate COPY statements. Removes benchmark sources, creates bin/lib symlinks, validates the TensorRT-LLM plugin library, updates loader config, and verifies executorWorker binary linking. OSS attribution script is mounted directly instead of copied.
Jenkins CI image tag updates
jenkins/current_image_tags.properties
All five LLM-related Docker image tags are updated from build 202606012126-14025 to 202606051544-14972, referencing the optimized release images.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the main change: consolidating Docker image layers in the release stage using bind mounts, which directly matches the primary objectives of the pull request.
Description check ✅ Passed The PR description clearly explains the objective, references the superseded PR, includes Jan's original summary with technical details, and includes a completed checklist. However, the Test Coverage section is empty/incomplete.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docker/Dockerfile.multi`:
- Line 165: The ldd predicate can silently pass when there are no tensorrt_llm
lines; change the check to first assert the dependency exists and then assert it
is not "not found" — e.g. run ldd -v bin/executorWorker and test with grep -q
'tensorrt_llm' (fail if missing) and only then check that the matched lines do
NOT contain "not found" (e.g. ! grep tensorrt_llm | grep -q "not found");
reference the binary name bin/executorWorker and the dependency token
tensorrt_llm when updating the Dockerfile multi-line gate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 14825cce-cb85-4f12-b17e-5176b75b12ae

📥 Commits

Reviewing files that changed from the base of the PR and between 910826b and 4aefe77.

📒 Files selected for processing (2)
  • docker/Dockerfile.multi
  • jenkins/current_image_tags.properties

Comment thread docker/Dockerfile.multi
test -f lib/libnvinfer_plugin_tensorrt_llm.so && \
echo "/app/tensorrt_llm/lib" > /etc/ld.so.conf.d/tensorrt_llm.conf && \
ldconfig && \
! ( ldd -v bin/executorWorker | grep tensorrt_llm | grep -q "not found" ) && \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Demonstrate why the current predicate can pass when tensorrt_llm is absent.
set -euo pipefail
if ! ( printf 'libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6\n' | grep tensorrt_llm | grep -q "not found" ); then
  echo "Current check passes even with no tensorrt_llm line (false negative)."
fi

Repository: NVIDIA/TensorRT-LLM

Length of output: 133


🏁 Script executed:

#!/bin/bash
set -euo pipefail
nl -ba docker/Dockerfile.multi | sed -n '145,190p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 106


🏁 Script executed:

#!/bin/bash
set -euo pipefail
awk 'NR>=145 && NR<=190 { printf "%d:%s\n", NR, $0 }' docker/Dockerfile.multi

Repository: NVIDIA/TensorRT-LLM

Length of output: 2347


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "SHELL|pipefail" docker/Dockerfile.multi || true

Repository: NVIDIA/TensorRT-LLM

Length of output: 92


Strengthen the ldd gate to fail when executorWorker has no tensorrt_llm dependency lines. (docker/Dockerfile.multi:165)

The current predicate can pass when ldd output contains no tensorrt_llm matches at all, since grep tensorrt_llm producing no input also makes grep -q "not found" return non-match, which is then inverted by !.

Suggested change
-    ! ( ldd -v bin/executorWorker | grep tensorrt_llm | grep -q "not found" ) && \
+    ldd -v bin/executorWorker | grep -q "tensorrt_llm" && \
+    ! ( ldd -v bin/executorWorker | grep "tensorrt_llm" | grep -q "not found" ) && \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
! ( ldd -v bin/executorWorker | grep tensorrt_llm | grep -q "not found" ) && \
ldd -v bin/executorWorker | grep -q "tensorrt_llm" && \
! ( ldd -v bin/executorWorker | grep "tensorrt_llm" | grep -q "not found" ) && \
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker/Dockerfile.multi` at line 165, The ldd predicate can silently pass
when there are no tensorrt_llm lines; change the check to first assert the
dependency exists and then assert it is not "not found" — e.g. run ldd -v
bin/executorWorker and test with grep -q 'tensorrt_llm' (fail if missing) and
only then check that the matched lines do NOT contain "not found" (e.g. ! grep
tensorrt_llm | grep -q "not found"); reference the binary name
bin/executorWorker and the dependency token tensorrt_llm when updating the
Dockerfile multi-line gate.

@tburt-nv
Copy link
Copy Markdown
Collaborator Author

tburt-nv commented Jun 5, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52374 [ run ] completed with state SUCCESS. Commit: 4aefe77
/LLM/main/L0_MergeRequest_PR pipeline #41672 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52388 [ run ] triggered by Bot. Commit: 4aefe77 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52388 [ run ] completed with state FAILURE. Commit: 4aefe77

Link to invocation

@tburt-nv
Copy link
Copy Markdown
Collaborator Author

tburt-nv commented Jun 5, 2026

/bot run --disable-fail-fast

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants