ROX-19064: Scanner V4 CI resource/replica tuning#19834
ROX-19064: Scanner V4 CI resource/replica tuning#19834
Conversation
|
Skipping CI for Draft Pull Request. |
|
/test all |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #19834 +/- ##
==========================================
- Coverage 49.60% 49.56% -0.04%
==========================================
Files 2763 2764 +1
Lines 208339 208357 +18
==========================================
- Hits 103341 103269 -72
- Misses 97331 97436 +105
+ Partials 7667 7652 -15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
🚀 Build Images ReadyImages are ready for commit d54a5ed. To use with deploy scripts: export MAIN_IMAGE_TAG=4.11.x-564-gd54a5edd01 |
|
/test ocp-4-21-qa-e2e-tests |
|
/test ocp-4-21-qa-e2e-tests |
|
/retest |
BradLugo
left a comment
There was a problem hiding this comment.
Might be worth checking in with folks familiar with the perf & scale initiative (@robbycochran, @davdhacs, @mtodor, and @jvdm come to mind), but LGTM.
Any concerns @robbycochran, @davdhacs, @mtodor, and @jvdm? |
jvdm
left a comment
There was a problem hiding this comment.
This PR makes adjustments to sync the settings of the different install methods and help stabilize the deployments.
The PR aligns Helm and Manifest with each other, but leaves the Operator CR indexer (and partially matcher/db) misaligned. Is that intentional?
Also, I'm assuming you're not aligning with the defaults. My only thought about comparing with the defaults is we do not request or limit higher than that.
|
Great callout - going to revisit this - agreed that the limits shouldn't exceed defaults. |
|
Took a different approach, all customzed cpu/mem limits and mem requests have been removed - the installer default values will be used instead. The CPU requests are still customized as those were directly preventing scheduling in CI. This approach tested OK with Scanner V4 enabled for all CI. Will be tested again in a future PR when the Scanner V4 switch is officially turned on. |
Description
Scanner V4 components (indexer, matcher, DB) were OOMkilling and/or failing to be scheduled (node CPU unavail) with the existing resource allocations. This PR makes adjustments to the settings to help stabilize the deployments.
Each deploy path (manifest, helm, and operator) had its own tuning mechanism
User-facing documentation
Testing and quality
Automated testing
The changes themselves are tests
How I validated my change
Against StackRox Scanner these changes will be tested by CI as part of this PR
Against Scanner V4 these changes were validated in #19236 and will be validated again in a future PR when Scanner V4 is officially turned on in CI.