Skip to content

ROX-19064: Scanner V4 CI resource/replica tuning#19834

Open
dcaravel wants to merge 4 commits intomasterfrom
dc/scan4-resource-tuning
Open

ROX-19064: Scanner V4 CI resource/replica tuning#19834
dcaravel wants to merge 4 commits intomasterfrom
dc/scan4-resource-tuning

Conversation

@dcaravel
Copy link
Copy Markdown
Contributor

@dcaravel dcaravel commented Apr 5, 2026

Description

Scanner V4 components (indexer, matcher, DB) were OOMkilling and/or failing to be scheduled (node CPU unavail) with the existing resource allocations. This PR makes adjustments to the settings to help stabilize the deployments.

Each deploy path (manifest, helm, and operator) had its own tuning mechanism

  • Helm: deploy/common/ci-values.yaml
  • Manifest: deploy/common/k8sbased.sh + scanner-v4-*-patch.yaml
  • Operator: tests/e2e/yaml/central-cr.envsubst.yaml

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

The changes themselves are tests

How I validated my change

Against StackRox Scanner these changes will be tested by CI as part of this PR

Against Scanner V4 these changes were validated in #19236 and will be validated again in a future PR when Scanner V4 is officially turned on in CI.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 5, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@dcaravel
Copy link
Copy Markdown
Contributor Author

dcaravel commented Apr 5, 2026

/test all

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.56%. Comparing base (2d5d7a2) to head (d54a5ed).
⚠️ Report is 67 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #19834      +/-   ##
==========================================
- Coverage   49.60%   49.56%   -0.04%     
==========================================
  Files        2763     2764       +1     
  Lines      208339   208357      +18     
==========================================
- Hits       103341   103269      -72     
- Misses      97331    97436     +105     
+ Partials     7667     7652      -15     
Flag Coverage Δ
go-unit-tests 49.56% <ø> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

🚀 Build Images Ready

Images are ready for commit d54a5ed. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-564-gd54a5edd01

@dcaravel
Copy link
Copy Markdown
Contributor Author

dcaravel commented Apr 5, 2026

/test ocp-4-21-qa-e2e-tests

@dcaravel dcaravel added the auto-retest PRs with this label will be automatically retested if prow checks fails label Apr 6, 2026
@dcaravel
Copy link
Copy Markdown
Contributor Author

dcaravel commented Apr 6, 2026

/test ocp-4-21-qa-e2e-tests

@dcaravel dcaravel marked this pull request as ready for review April 6, 2026 13:50
@dcaravel dcaravel requested review from a team and mclasmeier April 6, 2026 13:59
@rhacs-bot
Copy link
Copy Markdown
Contributor

/retest

Copy link
Copy Markdown
Contributor

@BradLugo BradLugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth checking in with folks familiar with the perf & scale initiative (@robbycochran, @davdhacs, @mtodor, and @jvdm come to mind), but LGTM.

@dcaravel
Copy link
Copy Markdown
Contributor Author

dcaravel commented Apr 7, 2026

Might be worth checking in with folks familiar with the perf & scale initiative (@robbycochran, @davdhacs, @mtodor, and @jvdm come to mind), but LGTM.

Any concerns @robbycochran, @davdhacs, @mtodor, and @jvdm?

Copy link
Copy Markdown
Contributor

@jvdm jvdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR makes adjustments to sync the settings of the different install methods and help stabilize the deployments.

The PR aligns Helm and Manifest with each other, but leaves the Operator CR indexer (and partially matcher/db) misaligned. Is that intentional?

Also, I'm assuming you're not aligning with the defaults. My only thought about comparing with the defaults is we do not request or limit higher than that.

@dcaravel
Copy link
Copy Markdown
Contributor Author

dcaravel commented Apr 9, 2026

@jvdm

Great callout - going to revisit this - agreed that the limits shouldn't exceed defaults.

@dcaravel
Copy link
Copy Markdown
Contributor Author

Took a different approach, all customzed cpu/mem limits and mem requests have been removed - the installer default values will be used instead.

The CPU requests are still customized as those were directly preventing scheduling in CI. This approach tested OK with Scanner V4 enabled for all CI. Will be tested again in a future PR when the Scanner V4 switch is officially turned on.

@dcaravel dcaravel requested review from BradLugo and jvdm April 11, 2026 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-retest PRs with this label will be automatically retested if prow checks fails

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants