Fix DD_APM_TRACING_ENABLED to work with LLMObs#10989
Fix DD_APM_TRACING_ENABLED to work with LLMObs#10989matsumo-and wants to merge 7 commits intoDataDog:masterfrom
Conversation
Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
Fixes DataDog#10051 When DD_APM_TRACING_ENABLED=false is set, APM tracing should be disabled while allowing other products like LLM Observability to function. Previously, setting DD_APM_TRACING_ENABLED=false would inadvertently disable LLMObs or allow APM traces to leak through when no other products were enabled. Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
|
Hi @smola , just a gentle ping. |
| if (!config.isApmTracingEnabled()) { | ||
| if (config.isLlmObsEnabled()) { | ||
| log.debug("APM is disabled, but LLMObs is enabled. All LLMObs traces will be kept."); | ||
| return new LlmObsStandaloneSampler(); |
There was a problem hiding this comment.
It appears this would break ASM. If ASM is enabled, we still need to pass, at least, 1 APM/ASM per minute.
There was a problem hiding this comment.
There was a problem hiding this comment.
Asking @DataDog/asm-java for another look to make sure the ASM sampling will be fine.
When APM tracing is disabled but both LLMObs and ASM are enabled, the previous code returned LlmObsStandaloneSampler which never sent the 1 APM trace/minute required by ASM for billing and service catalog. Introduce LlmObsAndAsmStandaloneSampler that keeps all LLMObs and ASM traces while rate-limiting plain APM traces to 1 per minute. Also clarify the log message for the ASM-only standalone case. Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
|
Sorry for the late review :( I think the PR fixes the immediate issue correctly, but it introduces a structural technical debt. This is something we had in mind when we started with the standalone implementations, but we didn’t move forward at the time since no other products needed it. The current approach introduces a new sampler class per product combination:
The next standalone product will require yet another class and another branch. Suggested alternative (just an example) Each product has three properties, its enum StandaloneProduct {
ASM (ProductTraceSource.ASM, SamplingMechanism.APPSEC, true),
LLMOBS(ProductTraceSource.LLMOBS, SamplingMechanism.DEFAULT, false);
}A single StandaloneSampler receives the list of active products, iterates them to detect product-marked traces, and falls back to rate-limiting (or drop) depending on whether any active product requires the billing trace. forConfig becomes flat and open for extension: if (!config.isApmTracingEnabled()) {
List<StandaloneProduct> active = new ArrayList<>();
if (isAsmEnabled(config)) active.add(StandaloneProduct.ASM);
if (config.isLlmObsEnabled()) active.add(StandaloneProduct.LLMOBS);
// next product: one line here + one entry in the enum
return active.isEmpty()
? new ForcePrioritySampler(SAMPLER_DROP, DEFAULT)
: new StandaloneSampler(active, Clock.systemUTC());
}The same generalisation applies to the manual && per product in Maybe I’m missing some context since this is an older topic, but what do you think about the suggestion? Do you see it as viable? cc:@smola |
|
Thanks for the detailed feedback, @jandro996 — this makes a lot of sense. I initially kept the change minimal to address the immediate issue, but I agree that the current approach doesn’t scale well as more standalone products are added, and your suggestion of using a single Also, good point about applying the same generalization to I’m happy to refactor the implementation along these lines. |
Replace the three separate sampler classes (AsmStandaloneSampler, LlmObsStandaloneSampler, LlmObsAndAsmStandaloneSampler) with a single StandaloneSampler that iterates over a list of active StandaloneProduct entries, making it trivially extensible to future products. Also add ProductTraceSource.isAnyStandaloneProductMarked() to simplify the TraceCollector force-keep bypass check.
|
Hi @jandro996, I've refactored the implementation along your suggestion:
cc: @smola |
What Does This Do
This PR fixes two issues when using LLM Observability without APM tracing:
DD_APM_TRACING_ENABLED=falseproperly drop APM traces while keeping LLMObs tracesNullPointerExceptionwhenDD_TRACE_ENABLED=falseis set with LLMObs enabledMotivation
Solves #10051
Currently, when
DD_APM_TRACING_ENABLED=falseis set withDD_LLMOBS_ENABLED=true, APM traces are still sent to the agent because the sampler selection logic falls through to the default sampler, which keeps all traces by default.Additionally, setting
DD_TRACE_ENABLED=falsecauses NPE in LLMObs methods because LLMObs tries to initialize even when all tracing is disabled.Additional Notes
Implementation approach:
TraceSegment.setTagTop()to propagate the LLMOBS flag from child spans to the root span, following the exact same pattern used by ASM (AppSecEventTracker, GatewayBridge, IAST Reporter)LlmObsStandaloneSamplerwhenDD_APM_TRACING_ENABLED=falseandDD_LLMOBS_ENABLED=trueLLMObsSystem.start()when!config.isTraceEnabled()Testing:
dd-smoke-tests/apm-tracing-disabled/Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueNote: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.