Skip to content

fix(litellm): store bookkeeping span off-band, not in forwarded metadata#6598

Open
jgreer013 wants to merge 2 commits into
getsentry:masterfrom
jgreer013:jgreer013/issue-6596
Open

fix(litellm): store bookkeeping span off-band, not in forwarded metadata#6598
jgreer013 wants to merge 2 commits into
getsentry:masterfrom
jgreer013:jgreer013/issue-6596

Conversation

@jgreer013

Copy link
Copy Markdown

Summary

Fixes #6596 (Linear PY-2540).

With LiteLLMIntegration enabled, any call that passes caller metadata could crash during request serialization:

TypeError: Object of type Span is not JSON serializable

Root cause

_input_callback stored the live span via _get_metadata_dict(kwargs)["_sentry_span"] = span. litellm threads the caller's metadata dict through to litellm_params["metadata"] — it's the same dict object the caller passed — and some providers (e.g. Anthropic's /v1/messages passthrough) forward that dict into the outbound request body. So the live Span landed at request_body["metadata"]["_sentry_span"] and json.dumps(request_body) failed before the request was sent.

Separately, under send_default_pii=True + include_prompts=True, that span's gen_ai.request.messages holds the verbatim prompt, so any sink that serialized the injected metadata could leak prompt content to the provider.

Fix

Store the bookkeeping span off-band in a module-level registry keyed by litellm_call_id (a per-request UUID that is stable across the input/success/failure callbacks), falling back to the identity of the shared callback kwargs dict for direct callback invocations that omit it. The span no longer lives in any litellm-visible dict, so it can't be forwarded, serialized, or deep-copied by litellm — fixing both the crash and the prompt-leak vector. The registry entry is removed by the terminal success/failure callback (streaming success peeks and pops only on the final call).

Testing

  • New regression test test_caller_metadata_stays_json_serializable asserts the forwarded metadata stays free of _sentry_span and JSON-serializable at the serialization moment, while the span is still recorded off-band. It fails on master and passes with this change.
  • Full litellm suite: 160 passed (litellm v1.81.16), including streaming/async/failure paths.
  • Regression test also green on the oldest (v1.77.7) and latest litellm matrix entries.
  • mypy clean; ruff check + format clean.

Related (downstream, defense-in-depth)

litellm forwards/serializes this injected metadata into the provider body. Companion issue: BerriAI/litellm#30662. litellm has stripped such span objects in other logging paths (BerriAI/litellm#15728, BerriAI/litellm#12354).

jgreer013 and others added 2 commits June 17, 2026 10:14
The LiteLLM integration stored its live Span via
`metadata["_sentry_span"] = span`. litellm threads the caller's
`metadata` dict through to `litellm_params`, and some providers (e.g.
Anthropic's /v1/messages passthrough) forward that dict into the
outbound request body, so `json.dumps(request_body)` raised
`TypeError: Object of type Span is not JSON serializable` before the
request was even sent. The span (which holds verbatim prompts under
send_default_pii) could also leak to the provider.

Store the span off-band in a module-level registry keyed by
`litellm_call_id` (falling back to the callback kwargs identity)
instead of inside any litellm-visible dict, and remove the entry in
the terminal success/failure callback.

Fixes getsentry#6596

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address self-review: the module-level span registry was only evicted by
the terminal success/failure callback, so a call abandoned before a
terminal callback fires (e.g. a stream the caller stops iterating) leaked
its Span entry -- holding prompt data -- for the process lifetime. The
prior kwargs-scoped storage was GC'd with the request, so this was a
regression.

Back the registry with an OrderedDict capped at _MAX_TRACKED_SPANS and
evict oldest-first in _store_span, so abandoned calls cannot grow it
unbounded. A WeakValueDictionary is not an option here: Span/Transaction
objects are not weakly referenceable.

Add tests for the bound, terminal-callback cleanup, and the
litellm_call_id-absent fallback key; correct the registry comment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jgreer013 jgreer013 force-pushed the jgreer013/issue-6596 branch from 6fb4fa5 to 5a5f39b Compare June 18, 2026 04:29
@jgreer013 jgreer013 marked this pull request as ready for review June 18, 2026 04:36
@jgreer013 jgreer013 requested a review from a team as a code owner June 18, 2026 04:36

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5a5f39b. Configure here.

_spans_by_call[key] = span
_spans_by_call.move_to_end(key)
while len(_spans_by_call) > _MAX_TRACKED_SPANS:
_spans_by_call.popitem(last=False)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registry eviction skips span exit

Medium Severity

When _store_span caps the off-band registry, evicted entries are removed with popitem without calling __exit__ on the stored span, even though _input_callback already entered it. A request whose mapping is evicted while still in flight then hits _peek_span as None on success and returns early, so the span never finishes and completion data is skipped under sustained concurrent load.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5a5f39b. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LiteLLMIntegration stores a non-JSON-serializable Span in request metadata, breaking outbound LLM requests

1 participant