Skip to content

Fix DD_APM_TRACING_ENABLED to work with LLMObs#10989

Open
matsumo-and wants to merge 7 commits intoDataDog:masterfrom
matsumo-and:fix/LLMOBS-10051
Open

Fix DD_APM_TRACING_ENABLED to work with LLMObs#10989
matsumo-and wants to merge 7 commits intoDataDog:masterfrom
matsumo-and:fix/LLMOBS-10051

Conversation

@matsumo-and
Copy link
Copy Markdown

@matsumo-and matsumo-and commented Mar 28, 2026

What Does This Do

This PR fixes two issues when using LLM Observability without APM tracing:

  1. Makes DD_APM_TRACING_ENABLED=false properly drop APM traces while keeping LLMObs traces
  2. Prevents NullPointerException when DD_TRACE_ENABLED=false is set with LLMObs enabled

Motivation

Solves #10051

Currently, when DD_APM_TRACING_ENABLED=false is set with DD_LLMOBS_ENABLED=true, APM traces are still sent to the agent because the sampler selection logic falls through to the default sampler, which keeps all traces by default.

Additionally, setting DD_TRACE_ENABLED=false causes NPE in LLMObs methods because LLMObs tries to initialize even when all tracing is disabled.

Additional Notes

Implementation approach:

  1. ProductTraceSource.LLMOBS flag (0x20): Added following the existing pattern for ASM/DSM/DBM products
  2. LlmObsStandaloneSampler: New sampler that drops all traces by default (SAMPLER_DROP) but keeps traces with the LLMOBS marker (SAMPLER_KEEP)
  3. DDLLMObsSpan marking: Uses TraceSegment.setTagTop() to propagate the LLMOBS flag from child spans to the root span, following the exact same pattern used by ASM (AppSecEventTracker, GatewayBridge, IAST Reporter)
  4. Sampler selection logic: Updated to choose LlmObsStandaloneSampler when DD_APM_TRACING_ENABLED=false and DD_LLMOBS_ENABLED=true
  5. NPE prevention: Added early return in LLMObsSystem.start() when !config.isTraceEnabled()

Testing:

  • Added smoke tests in dd-smoke-tests/apm-tracing-disabled/
  • Verified on master: LLMObs works but APM traces not dropped
  • Verified on fix branch: LLMObs works and APM traces properly dropped

Contributor Checklist

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
Fixes DataDog#10051

When DD_APM_TRACING_ENABLED=false is set, APM tracing should be disabled
while allowing other products like LLM Observability to function. Previously,
setting DD_APM_TRACING_ENABLED=false would inadvertently disable LLMObs or
allow APM traces to leak through when no other products were enabled.

Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
@matsumo-and matsumo-and marked this pull request as ready for review March 28, 2026 10:13
@matsumo-and matsumo-and requested review from a team as code owners March 28, 2026 10:13
@bric3 bric3 requested a review from smola April 2, 2026 08:36
@matsumo-and
Copy link
Copy Markdown
Author

Hi @smola , just a gentle ping.
I'd appreciate your review when you have time.
Thanks!

Comment thread dd-trace-core/src/main/java/datadog/trace/common/sampling/Sampler.java Outdated
if (!config.isApmTracingEnabled()) {
if (config.isLlmObsEnabled()) {
log.debug("APM is disabled, but LLMObs is enabled. All LLMObs traces will be kept.");
return new LlmObsStandaloneSampler();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this would break ASM. If ASM is enabled, we still need to pass, at least, 1 APM/ASM per minute.

Copy link
Copy Markdown
Author

@matsumo-and matsumo-and Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smola

Thanks for pointing this out!
When both are enabled, LlmObsStandaloneSampler was returned and the ASM branch was never reached.

Fixed by adding LlmObsAndAsmStandaloneSampler that keeps all LLMObs/ASM traces while still allowing 1 APM trace per minute for billing.

69107cb

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking @DataDog/asm-java for another look to make sure the ASM sampling will be fine.

When APM tracing is disabled but both LLMObs and ASM are enabled,
the previous code returned LlmObsStandaloneSampler which never sent
the 1 APM trace/minute required by ASM for billing and service catalog.

Introduce LlmObsAndAsmStandaloneSampler that keeps all LLMObs and
ASM traces while rate-limiting plain APM traces to 1 per minute.

Also clarify the log message for the ASM-only standalone case.

Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
@Kyle-Verhoog Kyle-Verhoog added the LLMObs LLM Observability related label Apr 7, 2026
@matsumo-and matsumo-and requested a review from smola April 8, 2026 16:24
@jandro996
Copy link
Copy Markdown
Member

jandro996 commented Apr 16, 2026

Sorry for the late review :(

I think the PR fixes the immediate issue correctly, but it introduces a structural technical debt. This is something we had in mind when we started with the standalone implementations, but we didn’t move forward at the time since no other products needed it.

The current approach introduces a new sampler class per product combination:

  • APM disabled + ASM → AsmStandaloneSampler
  • APM disabled + LLMObs → LlmObsStandaloneSampler
  • APM disabled + ASM + LLMObs → LlmObsAndAsmStandaloneSampler

The next standalone product will require yet another class and another branch.

Suggested alternative (just an example)

Each product has three properties, its ProductTraceSource bit, the SamplingMechanism to use when keeping, and whether it needs the 1-per-minute billing trace:

enum StandaloneProduct {
     ASM   (ProductTraceSource.ASM,    SamplingMechanism.APPSEC,  true),
     LLMOBS(ProductTraceSource.LLMOBS, SamplingMechanism.DEFAULT, false);
 }

A single StandaloneSampler receives the list of active products, iterates them to detect product-marked traces, and falls back to rate-limiting (or drop) depending on whether any active product requires the billing trace. forConfig becomes flat and open for extension:

if (!config.isApmTracingEnabled()) {
      List<StandaloneProduct> active = new ArrayList<>();
      if (isAsmEnabled(config))     active.add(StandaloneProduct.ASM);
      if (config.isLlmObsEnabled()) active.add(StandaloneProduct.LLMOBS);
      // next product: one line here + one entry in the enum

      return active.isEmpty()
          ? new ForcePrioritySampler(SAMPLER_DROP, DEFAULT)
          : new StandaloneSampler(active, Clock.systemUTC());
  }

The same generalisation applies to the manual && per product in TraceCollector.setSamplingPriorityIfNecessary, a helper like ProductTraceSource.isAnyStandaloneProductMarked(traceSource) would consolidate that.

Maybe I’m missing some context since this is an older topic, but what do you think about the suggestion? Do you see it as viable?

cc:@smola

@matsumo-and
Copy link
Copy Markdown
Author

Thanks for the detailed feedback, @jandro996 — this makes a lot of sense.

I initially kept the change minimal to address the immediate issue, but I agree that the current approach doesn’t scale well as more standalone products are added, and your suggestion of using a single StandaloneSampler with a product-based configuration looks much cleaner and easier to extend.

Also, good point about applying the same generalization to TraceCollector. Consolidating the per-product checks into a helper like ProductTraceSource.isAnyStandaloneProductMarked would definitely improve maintainability and avoid duplicating logic.

I’m happy to refactor the implementation along these lines.

Replace the three separate sampler classes (AsmStandaloneSampler,
LlmObsStandaloneSampler, LlmObsAndAsmStandaloneSampler) with a single
StandaloneSampler that iterates over a list of active StandaloneProduct
entries, making it trivially extensible to future products.

Also add ProductTraceSource.isAnyStandaloneProductMarked() to simplify
the TraceCollector force-keep bypass check.
@matsumo-and
Copy link
Copy Markdown
Author

matsumo-and commented Apr 16, 2026

Hi @jandro996, I've refactored the implementation along your suggestion:

  • Replaced the three separate sampler classes with a single StandaloneSampler that takes a List
  • Added a StandaloneProduct enum holding each product's trace source bit, sampling mechanism, and billing trace flag
  • Added ProductTraceSource.isAnyStandaloneProductMarked() to simplify the check in TraceCollector
  • Sampler.Builder.forConfig() is now flat and open for extension — adding a new product is just one line in the enum and one line in the builder

6071b27

cc: @smola

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LLMObs LLM Observability related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants