Skip to content
Draft
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
79d020c
Add design spec for conditional init() execution in busybox binary
janisz Apr 13, 2026
ce0bf8d
Add implementation plan for conditional init() execution
janisz Apr 13, 2026
99f16b5
feat: create central/app package structure
janisz Apr 13, 2026
391316f
docs: analyze sensor → central import chains
janisz Apr 13, 2026
6834046
feat: establish app/ structure for config-controller
janisz Apr 13, 2026
29b615f
fix: move admission-control init() to explicit initialization
janisz Apr 13, 2026
2cf4237
refactor: add GraphQL loader init structure (stub)
janisz Apr 13, 2026
2af5b79
refactor: add compliance init structure (stub)
janisz Apr 13, 2026
eb3f1e0
refactor: move Central metrics from init() to explicit registration
janisz Apr 13, 2026
bdfdc77
refactor: move sensor metrics init() to explicit initialization
janisz Apr 13, 2026
2287e20
docs: add verification report and architecture guide
janisz Apr 13, 2026
0449ea6
docs: Phase 5 low-hanging fruit analysis and migration plan
janisz Apr 13, 2026
8b5a49f
docs: add busybox-scoped Phase 5 recommendations
janisz Apr 13, 2026
eca4ef0
docs: add heap profile component labeling fix
janisz Apr 13, 2026
d3de83e
feat: add component labeling for heap/CPU profiles
janisz Apr 13, 2026
987c76a
docs: update heap profile labeling doc with implementation details
janisz Apr 13, 2026
82a6968
refactor: minimize metrics init diff by keeping logic in metrics pack…
janisz Apr 13, 2026
c78f077
chore: remove documentation files
janisz Apr 13, 2026
93f7842
refactor: remove app/init.go files, call metrics.Init directly from a…
janisz Apr 13, 2026
db6b9d5
fix: update metric Init() comments to reference app.go
janisz Apr 13, 2026
f1a6ada
refactor: migrate GraphQL loaders and compliance checks to explicit I…
janisz Apr 13, 2026
6744213
refactor: rename init() to register*() in kubernetes compliance checks
janisz Apr 13, 2026
9b30c56
refactor: rename init() to Register*() in hipaa_164 compliance checks
janisz Apr 13, 2026
f24eb46
refactor: rename init() to Register*() in nist80053 compliance checks
janisz Apr 13, 2026
668c7ef
refactor: rename init() to Register*() in nist800-190 compliance checks
janisz Apr 13, 2026
0a704c0
refactor: rename init() to Register*() in pcidss32 compliance checks
janisz Apr 13, 2026
b84cb73
refactor: replace blank imports with explicit Init() calls in complia…
janisz Apr 13, 2026
3b1086c
refactor: rename init() to Register*() in central hipaa_164 complianc…
janisz Apr 13, 2026
530b499
refactor: rename init() to Register*() in central nist800-190 complia…
janisz Apr 13, 2026
f487fd3
refactor: rename init() to Register*() in central nist80053 complianc…
janisz Apr 13, 2026
4e8afdd
refactor: rename init() to Register*() in central pcidss32 compliance…
janisz Apr 13, 2026
f8083d4
refactor: rename init() to Init() in central remote compliance checks
janisz Apr 13, 2026
4c0dbf7
refactor: replace blank imports with explicit Init() in central compl…
janisz Apr 13, 2026
998006c
refactor: rename init() to Register*() in all notifier factories
janisz Apr 13, 2026
3ab5d12
refactor: rename init() to Register*() in compliance standards metadata
janisz Apr 13, 2026
5dfcdc9
refactor: rename init() to Register*() in external backup plugins
janisz Apr 13, 2026
26ed417
refactor: migrate init() to explicit Init() pattern and centralize pr…
janisz Apr 14, 2026
e9f029f
fix: break import cycle in sensor telemetry gatherers
janisz Apr 14, 2026
4eb3d12
fix: resolve golangci-lint failures from init() migration
janisz Apr 14, 2026
51425e9
fix: expand gochecknoinits exclusion to cover all legacy directories
janisz Apr 14, 2026
441a472
refactor: migrate init() to explicit Init() pattern across all compon…
janisz Apr 14, 2026
d62a670
refactor: migrate init() to explicit Init() in roxctl, sensor, tools,…
janisz Apr 14, 2026
a7c0386
style: fix gofmt formatting in volume converter files
janisz Apr 14, 2026
0d397c9
config: add pkg/images/enricher/metadata.go to gochecknoinits exclusion
janisz Apr 14, 2026
0cb5d2a
refactor: migrate migrator init() to explicit Register() pattern
janisz Apr 14, 2026
6fceb99
refactor: migrate remaining 9 pkg/ init() functions to explicit Init()
janisz Apr 14, 2026
dd31084
refactor: unexport centralRun - only used within main package
janisz Apr 14, 2026
aeee0e4
fix: apply critical fixes from split PRs to main branch
janisz Apr 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Add design spec for conditional init() execution in busybox binary
Problem: PR #19819's busybox consolidation causes all 535 init() functions
to run for every component, leading to OOMKills in config-controller and
admission-control under the race detector.

Solution: Move high-impact init() logic (~160 files) from package-level
init() to explicit component-specific initialization functions called from
app.Run(). Focuses on prometheus metrics, compliance checks, and GraphQL
loaders.

Targets config-controller (128 Mi, 7 OOMKills) and admission-control
(500 Mi, 6-7 OOMKills per replica) for immediate fixes, with phased
rollout for other optimizations.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
  • Loading branch information
janisz and claude committed Apr 13, 2026
commit 79d020cae660ef93d80a87a0ff9b243b2c0cd324
324 changes: 324 additions & 0 deletions docs/superpowers/specs/2026-04-13-conditional-init-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,324 @@
# Conditional Init() Execution for BusyBox-Style Binary

## Context

**Problem:** PR #19819 (merged April 8, 2024) consolidated all StackRox binaries into a single busybox-style binary. This causes ALL package init() functions to execute regardless of which component runs (central, sensor, admission-control, config-controller, etc.).

**Impact:** Under the race detector (~10x memory multiplier), components with tight memory limits experience OOMKills:
- **config-controller** (128 Mi limit): 7 OOMKills, heap grew 61 MB → 227 MB
- **admission-control** (500 Mi limit): 6-7 OOMKills per replica, similar memory growth
- **Root cause:** All 535 init() functions run for every component, including:
- 45+ central prometheus metrics (central-only)
- 50+ sensor prometheus metrics (sensor-only)
- 109 compliance check registrations (central-only)
- 15+ GraphQL loader registrations (central-only)

**Goal:** Make initialization conditional based on which component is running. Components should only run their own init() logic, not all 535 init() functions.

**Approach:** Hybrid migration - move high-impact init() logic (~160 files) from package-level init() functions into explicit component-specific initialization functions called from app.Run().

## Architecture

**Three-Layer Model:**
```
main.go (dispatcher)
→ routes to component via os.Args[0]

component/app/app.go
→ Run() calls component-specific init functions

component/app/init.go (NEW)
→ initMetrics(), initCompliance(), initGraphQL(), etc.
→ replaces package-level init() functions
```

**Key Principle:** Move from implicit (package init()) to explicit (component initialization functions).

## Implementation Plan

### Phase 1: Infrastructure Setup (app/ packages)

**Goal:** Ensure all components have app/ package structure.

**Tasks:**
1. Verify which components already have app/ from PR #19819
2. Create missing app/ packages for components that need them:
- `central/app/app.go` + `central/app/init.go`
- `sensor/kubernetes/app/app.go` + `sensor/kubernetes/app/init.go`
- `sensor/admission-control/app/app.go` + `sensor/admission-control/app/init.go`
- `config-controller/app/app.go` + `config-controller/app/init.go`
- Verify: `migrator/app/`, `compliance/cmd/compliance/app/`
3. Move existing main() logic to app.Run() where needed
4. Verify central/main.go dispatcher calls all app.Run() functions

**Files to modify:**
- New: `central/app/app.go`, `central/app/init.go`
- New: `sensor/kubernetes/app/init.go` (app.go may exist)
- New: `sensor/admission-control/app/init.go` (app.go may exist)
- New: `config-controller/app/init.go` (app.go may exist)
- Verify: `central/main.go` (dispatcher should already exist from PR #19819)

**Verification:**
- All components build successfully
- Dispatcher routing still works (os.Args[0] check)
- No behavior changes yet (this is structure-only)

### Phase 2: Critical Path (OOMKill Fixes)

**Goal:** Fix OOMKills in config-controller and admission-control.

**Priority Order:**
1. **config-controller** (128 Mi limit, 7 restarts)
- Create init.go with minimal init functions
- Break import chains to central packages

2. **admission-control** (500 Mi limit, 6-7 restarts)
- Create init.go with initMetrics()
- Move `sensor/admission-control/manager/metrics.go` init() logic

3. **Break sensor → central import chains**
- Ensure sensor components don't import:
- `central/compliance/checks`
- `central/graphql`
- `central/metrics`
- This prevents sensor from loading central's heavy init() functions

**Files to modify:**
- New: `config-controller/app/init.go`
- New: `sensor/admission-control/app/init.go`
- Modify: `sensor/admission-control/app/app.go` (call initMetrics())
- Remove init() from: `sensor/admission-control/manager/metrics.go`

**Critical Success Metric:** Zero OOMKills in config-controller and admission-control after this phase.

### Phase 3: Central Initialization Migration

**Goal:** Migrate central's high-impact init() functions to explicit initialization.

**Target Init Functions (160 files):**

1. **Prometheus metrics** (28+ central metric files)
- Primary: `central/metrics/init.go` (45+ metrics)
- Others: compliance, debug, scanner definitions, detection, etc.
- Pattern: Move prometheus.MustRegister() calls to initMetrics()

2. **Compliance checks** (109 files)
- `central/compliance/checks/remote/all.go`
- `central/compliance/checks/nist80053/*.go` (20+ files)
- `central/compliance/checks/pcidss32/*.go` (25+ files)
- `central/compliance/checks/hipaa_164/*.go` (15+ files)
- `pkg/compliance/checks/kubernetes/*.go` (32 files)
- Pattern: Consolidate framework.MustRegisterChecks() into initCompliance()

3. **GraphQL loaders** (15+ files)
- `central/graphql/resolvers/loaders/*.go`
- Files: policies.go, deployments.go, images.go, namespaces.go, nodes.go, etc.
- Pattern: Move RegisterTypeFactory() calls to initGraphQL()

4. **Compliance standards** (5 files)
- `central/compliance/standards/metadata/*.go`
- Files: cis_kubernetes.go, hipaa_164.go, nist_800_53.go, nist_800_190.go, pci_dss_3_2.go
- Pattern: Move AllStandards append logic to initComplianceStandards()

**Implementation:**

Create `central/app/init.go`:
```go
package app

func initMetrics() {
// Move code from central/metrics/init.go
prometheus.MustRegister(/* 45+ central metrics */)
}

func initCompliance() {
// Consolidate 109 compliance check registrations
framework.MustRegisterChecks(/* all checks */)
}

func initGraphQL() {
// Move 15+ loader registrations
RegisterTypeFactory(/* loaders */)
}

func initComplianceStandards() {
// Move compliance standard metadata
}
```

Modify `central/app/app.go`:
```go
func Run() {
memlimit.SetMemoryLimit()
premain.StartMain()

// NEW: Explicit initialization
initMetrics()
initCompliance()
initGraphQL()
initComplianceStandards()

// ... existing central startup logic
}
```

**Files to modify:**
- New: `central/app/init.go` (consolidates 160+ init functions)
- Modify: `central/app/app.go` (add init function calls)
- Remove init() from: 28+ metric files, 109 compliance check files, 15+ loader files, 5 standard files

### Phase 4: Sensor Initialization Migration

**Goal:** Migrate sensor's init() functions to explicit initialization.

**Target Init Functions:**

1. **Prometheus metrics** (11+ sensor metric files)
- Primary: `sensor/common/metrics/init.go` (50+ metrics)
- Others: detector, pubsub, networkflow, centralproxy, VM metrics, etc.
- Pattern: Move prometheus.MustRegister() calls to initMetrics()

**Implementation:**

Create `sensor/kubernetes/app/init.go`:
```go
package app

func initMetrics() {
// Move code from sensor/common/metrics/init.go
prometheus.MustRegister(/* 50+ sensor metrics */)
}
```

Modify `sensor/kubernetes/app/app.go`:
```go
func Run() {
memlimit.SetMemoryLimit()
premain.StartMain()

// NEW: Explicit initialization
initMetrics()

// ... existing sensor startup logic
}
```

**Files to modify:**
- Modify: `sensor/kubernetes/app/init.go` (add initMetrics())
- Modify: `sensor/kubernetes/app/app.go` (call initMetrics())
- Remove init() from: `sensor/common/metrics/init.go` and 10+ other sensor metric files

### Phase 5: Low-Hanging Fruit Migration

**Goal:** Opportunistically migrate remaining easy-to-move init() functions.

**Target Categories:**

1. **Simple Registry Registrations**
- `pkg/booleanpolicy/violationmessages/printer/gen-registrations.go` (100+ printer registrations)
- `pkg/search/enumregistry/enum_registry.go` (enum map initialization)
- Scanner-specific inits (scanner/enricher/nvd/nvd.go, etc.)

2. **Large Static Data Initialization**
- `pkg/search/options.go` (37KB file, large map initialization)
- `central/alert/mappings/options.go` (builds OptionsMap)

3. **Component-Specific Package Inits**
- Operator scheme registrations
- Roxctl-specific inits

**Approach:** Migrate these over time as we touch related code, or batch when convenient.

**Expected Coverage:**
- Phases 2-4: ~160 high-impact init() functions (fixes OOMKills)
- Phase 5: Additional ~40-90 functions (further optimization)
- Total migrated: ~200-250 of 535 init() functions

**Remaining:** ~300 init() functions are either truly shared, negligible impact, or have complex dependencies (defer to future work).

## Migration Pattern

**Before** (package-level init):
```go
// central/metrics/init.go
package metrics

var AlertProcessingDuration = prometheus.NewHistogramVec(...)

func init() {
prometheus.MustRegister(AlertProcessingDuration)
}
```

**After** (explicit initialization):
```go
// central/metrics/metrics.go
package metrics

var AlertProcessingDuration = prometheus.NewHistogramVec(...)
// No init() function

// central/app/init.go
package app

func initMetrics() {
prometheus.MustRegister(metrics.AlertProcessingDuration)
}
```

## Critical Files

**Dispatcher:**
- `central/main.go` - busybox dispatcher (verify only, should be from PR #19819)

**App packages to create/verify:**
- `central/app/app.go` + `central/app/init.go`
- `sensor/kubernetes/app/app.go` + `sensor/kubernetes/app/init.go`
- `sensor/admission-control/app/app.go` + `sensor/admission-control/app/init.go`
- `config-controller/app/app.go` + `config-controller/app/init.go`

**High-impact init() files to migrate (~160 files):**
- Central metrics: `central/metrics/init.go` + 27 others
- Sensor metrics: `sensor/common/metrics/init.go` + 10 others
- Compliance checks: 109 files in `central/compliance/checks/` and `pkg/compliance/checks/`
- GraphQL loaders: 15+ files in `central/graphql/resolvers/loaders/`
- Compliance standards: 5 files in `central/compliance/standards/metadata/`

## Verification

**CI Testing:**
- Build all components successfully
- Run existing test suites
- Deploy to test cluster with race detector enabled
- Monitor for OOMKills in config-controller and admission-control

**Expected Memory Impact:**

| Component | Current (race) | Target (race) | OOMKills Before | OOMKills After |
|-----------|---------------|---------------|-----------------|----------------|
| config-controller | ~150 MB (OOM @ 128 Mi) | < 100 MB | 7 | 0 |
| admission-control | ~600 MB (OOM @ 500 Mi) | < 400 MB | 6-7 per replica | 0 |
| central | 224 MB | ~224 MB (unchanged) | 0 | 0 |
| sensor | 227 MB | ~100 MB | 0 | 0 |

**Success Criteria:**
- Phase 2: Zero OOMKills in config-controller and admission-control
- Phase 3-4: Memory usage returns to pre-busybox levels for all components
- All phases: No functional regressions, all tests pass

## Rollout Strategy

**Merge Strategy:** Each phase merges independently
- Phase 1: Infrastructure, no behavior change, low risk
- Phase 2: Critical OOMKill fixes, high priority, merge ASAP
- Phase 3-4: Optimizations, merge after validation
- Phase 5: Opportunistic, merge when convenient

**Validation Between Phases:**
1. Merge to master
2. Wait for nightly build
3. Monitor admission-control/config-controller restart counts
4. Verify memory profiles
5. Proceed to next phase after validation

**Rollback Plan:** Changes are isolated to app/init.go files - can revert individual init functions without reverting entire change.