Skip to content

Fix waf.updates metric success:N/A tag from sequential counter metric values#11604

Open
jandro996 wants to merge 2 commits into
masterfrom
alejandro.gonzalez/APPSEC-62740
Open

Fix waf.updates metric success:N/A tag from sequential counter metric values#11604
jandro996 wants to merge 2 commits into
masterfrom
alejandro.gonzalez/APPSEC-62740

Conversation

@jandro996

@jandro996 jandro996 commented Jun 9, 2026

Copy link
Copy Markdown
Member

What Does This Do

  • Removes static AtomicInteger counters (wafInitCounter, wafUpdatesCounter) from WafMetricCollector and uses 1L as the metric value for both wafInit() and wafUpdates(), matching RFC semantics (COUNT-1-per-event)
  • In AppSecConfigServiceImpl.handleWafUpdateResultReport(), updates currentRuleVersion from e.wafDiagnostics.rulesetVersion in the InvalidRuleSetException catch block so wafUpdates() emits with the correct event_rules_version even when a partial WAF config error occurs
  • Changes log level for InvalidRuleSetException during WAF config updates from debug to warn
  • Adds a regression test verifying currentRuleVersion is populated from diagnostics when InvalidRuleSetException is thrown during a remote config update

Motivation

appsec.waf.updates was reporting success:N/A for approximately 10% of events.

The root cause was the use of sequential AtomicInteger counters (1, 2, 3, ...) as the metric value. The telemetry backend aggregates metrics with the same tags across polling windows. When the rawMetricsQueue (capacity 1024) is full and a metric offer is dropped, the counter has already been incremented — the next successful emission sends value N+1 with no preceding N, creating a gap the backend interprets as success:N/A.

The RFC specifies waf.init and waf.updates as COUNT-1-per-event metrics: each emission should carry value 1, not a cumulative count.

The secondary fix ensures currentRuleVersion is set from WAF diagnostics even when InvalidRuleSetException is thrown, so wafUpdates() carries the correct event_rules_version tag in error scenarios.

Additional Notes

The condition (!defaultConfigActivated || currentRuleVersion == null) in the error path mirrors the guard used in the success path: it prevents overwriting a user-customized rule version with a default config version, while still allowing the version to be set on the first update or when the default config is active.

Contributor Checklist

Jira ticket: APPSEC-62740

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

@datadog-datadog-prod-us1-2

This comment has been minimized.

@jandro996 jandro996 added type: enhancement Enhancements and improvements comp: asm waf Application Security Management (WAF) comp: telemetry Telemetry labels Jun 9, 2026
…of sequential counters

Removes wafInitCounter and wafUpdatesCounter AtomicInteger fields from
WafMetricCollector. Both wafInit() and wafUpdates() now use literal 1L,
matching RFC semantics (COUNT-1-per-event). The old sequential counters
(1, 2, 3, ...) caused success:N/A when the rawMetricsQueue was full and
a metric was dropped — the counter advanced but the offer() silently
failed, creating gaps the backend interpreted as success:N/A.

Also updates currentRuleVersion from e.wafDiagnostics.rulesetVersion in
the InvalidRuleSetException catch block of handleWafUpdateResultReport(),
so wafUpdates() emits with the correct event_rules_version even on
partial WAF config errors. Raises log level from debug to warn.
@jandro996 jandro996 force-pushed the alejandro.gonzalez/APPSEC-62740 branch from a1992d9 to a063275 Compare June 9, 2026 09:58
@dd-octo-sts

dd-octo-sts Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 13.90 s 13.93 s [-0.9%; +0.6%] (no difference)
startup:insecure-bank:tracing:Agent 12.88 s 12.99 s [-1.7%; +0.0%] (no difference)
startup:petclinic:appsec:Agent 16.93 s 16.77 s [-0.4%; +2.4%] (no difference)
startup:petclinic:iast:Agent 16.91 s 16.99 s [-1.9%; +1.0%] (no difference)
startup:petclinic:profiling:Agent 16.84 s 16.97 s [-2.4%; +0.8%] (no difference)
startup:petclinic:sca:Agent 16.93 s 16.74 s [-0.4%; +2.7%] (no difference)
startup:petclinic:tracing:Agent 15.87 s 15.85 s [-1.3%; +1.6%] (no difference)

Commit: fa4d6840 · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

@jandro996

Copy link
Copy Markdown
Member Author

@codex review

@jandro996 jandro996 marked this pull request as ready for review June 9, 2026 11:50
@jandro996 jandro996 requested a review from a team as a code owner June 9, 2026 11:50

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a06327584d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

&& !e.wafDiagnostics.rulesetVersion.isEmpty()
&& (!defaultConfigActivated || currentRuleVersion == null)) {
currentRuleVersion = e.wafDiagnostics.rulesetVersion;
statsReporter.setRulesVersion(currentRuleVersion);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid publishing failed rules as the active version

When an RC update throws InvalidRuleSetException before loading any valid rules, this path still updates statsReporter to the new diagnostics version. WAFStatsReporter.processTraceSegment then tags every subsequent trace with _dd.appsec.event_rules.version whenever rulesVersion is set, even though the WAF continues running the previous/default rules; the success path avoids this by requiring !wafDiagnostics.rules.getLoaded().isEmpty(), so the error path should keep the same guard for trace tagging or otherwise separate the failed-update metric version from the active rules version.

Useful? React with 👍 / 👎.

noExceptionThrown()
}

void 'currentRuleVersion is updated from diagnostics when InvalidRuleSetException is thrown on RC update'() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Move the new regression test to JUnit 5

The root AGENTS.md for this repo says, “Test frameworks: Always use JUnit 5. Do not write new Groovy / Spock tests,” but this change adds a new Spock test method to the Groovy spec. Even though the surrounding file is existing Groovy, new coverage should be added as a JUnit 5 test (or the relevant spec migrated) to avoid expanding the deprecated test style.

Useful? React with 👍 / 👎.

…loaded on error

On InvalidRuleSetException, always update currentRuleVersion and
setRuleVersion() on modules so wafUpdates() emits with the correct
event_rules_version. But only update statsReporter (trace tagging) if
rules.getLoaded() is non-empty, mirroring the success path guard.
This avoids tagging traces with a version whose rules failed to load.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: asm waf Application Security Management (WAF) comp: telemetry Telemetry type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant