Skip to content

[LiteLLM] _is_thinking_blocks_format drops Gemini thinking_blocks (only matches Anthropic 'signature' key) #5712

@ThibaultCoudertSephora

Description

@ThibaultCoudertSephora

Summary

google.adk.models.lite_llm._is_thinking_blocks_format, introduced in 1.28.0 via
fc45fa6 (PR closing #4801), gates Anthropic thinking_blocks parsing on the presence of a
per-block signature key:

# src/google/adk/models/lite_llm.py (main)
def _is_thinking_blocks_format(reasoning_value: Any) -> bool:
    """Returns True if reasoning_value is Anthropic thinking_blocks format."""
    if not isinstance(reasoning_value, list) or not reasoning_value:
        return False
    first = reasoning_value[0]
    return isinstance(first, dict) and "signature" in first

LiteLLM's Gemini integration also emits thinking_blocks when thinking is enabled on Gemini 2.5 / 3 models, but the per-block dicts do not carry a signature — the thought
signatures are returned at the response level under provider_specific_fields.thought_signatures as a parallel array. The detector therefore returns False, falls through to
_iter_reasoning_texts, which only matches dict keys ("text", "content", "reasoning", "reasoning_content") — Gemini blocks have "type" and "thinking", so nothing is
yielded
and the response surfaces zero thought Parts to the agent layer.

Net effect: a regression from <1.28.0 for any agent built on LiteLlm + a Gemini thinking model.

Affected versions

  • google-adk >= 1.28.0 (still present on main, 2026-05-15)

Environment

  • Python 3.12
  • google-adk 1.28.0+
  • litellm latest
  • Models reproduced on: gemini-3-flash-preview, gemini-2.5-pro (via LiteLLM proxy)

Actual LiteLLM response payload

Captured directly from LiteLLM with thought output enabled. Note choices[0].message.thinking_blocks shape and the separate response-level
provider_specific_fields.thought_signatures field:

{
  "model": "gemini-3-flash-preview",
  "choices": [{
    "finish_reason": "stop",
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "I am a large language model, trained by Google.",
      "reasoning_content": "**Understanding the User's Query and My Identity** ...",
      "thinking_blocks": [
        {
          "type": "thinking",
          "thinking": "**Understanding the User's Query and My Identity** ..."
        }
      ],
      "provider_specific_fields": {
        "thought_signatures": [
          "AY89a1/RGkcaRoJvGVOsj0pMpznJpT6OZESRZQF8ZYxB1+YHABJ+NjzLIb0fk8FOFQ..."
        ]
      }
    }
  }],
  "usage": {
    "completion_tokens": 73,
    "prompt_tokens": 5,
    "total_tokens": 78,
    "completion_tokens_details": {"reasoning_tokens": 62, "text_tokens": 11}
  }
}

Call trace through main

  1. _extract_reasoning_value(message) prefers thinking_blocks over reasoning_content — returns the Gemini list.
  2. _convert_reasoning_value_to_parts(reasoning_value) calls _is_thinking_blocks_format(...)False (no per-block signature).
  3. Falls back to _iter_reasoning_texts, which for each dict only yields under keys ("text", "content", "reasoning", "reasoning_content") — none present → yields nothing.
  4. Returned thought parts: []. The thought is lost.

Expected behavior

Gemini-shaped thinking_blocks should be recognized as a thinking-blocks payload and surfaced as Part(thought=True, text=...). The parallel signatures from
provider_specific_fields.thought_signatures should be attached to the corresponding thought parts so they can be relayed back to the model on subsequent turns.

Suggested fix

Normalize Gemini-shaped thinking_blocks into the Anthropic shape inside _extract_reasoning_value, by zipping the response-level thought_signatures onto each block. The
existing Anthropic codepath in _convert_reasoning_value_to_parts then handles both providers unchanged.

PR / unit tests below. Happy to open the PR if it looks right.

Related


PR diff

src/google/adk/models/lite_llm.py:

  @@ def _extract_reasoning_value(message: Message | Delta | None) -> Any:
     if message is None:
       return None
     # Anthropic models return thinking_blocks with type/thinking/signature fields.
     # This must be preserved to maintain thinking across tool call boundaries.
     thinking_blocks = message.get("thinking_blocks")
     if thinking_blocks is not None:
  +    # Gemini also emits thinking_blocks, but each block lacks a per-block
  +    # `signature`; signatures arrive in parallel under
  +    # `provider_specific_fields.thought_signatures`. Zip them in so the
  +    # downstream Anthropic codepath handles both providers uniformly.
  +    if (
  +        isinstance(thinking_blocks, list)
  +        and thinking_blocks
  +        and isinstance(thinking_blocks[0], dict)
  +        and "signature" not in thinking_blocks[0]
  +    ):
  +      provider_fields = message.get("provider_specific_fields") or {}
  +      signatures = provider_fields.get("thought_signatures") or []
  +      if signatures:
  +        merged: list[dict] = []
  +        for index, block in enumerate(thinking_blocks):
  +          if (
  +              isinstance(block, dict)
  +              and index < len(signatures)
  +              and signatures[index]
  +          ):
  +            merged.append({**block, "signature": signatures[index]})
  +          else:
  +            merged.append(block)
  +        thinking_blocks = merged
       return thinking_blocks
     reasoning_content = message.get("reasoning_content")
     if reasoning_content is not None:
       return reasoning_content
     return message.get("reasoning")

A note for maintainers (worth adding to the PR description, not the code): Anthropic per-block signature is treated as an opaque token and stored on Part.thought_signature via
signature.encode("utf-8"). Gemini signatures are base64-encoded bytes. If Part.thought_signature is expected to hold the decoded bytes (matching the outbound b64encode(...) path
in _extract_thought_signature_from_tool_call's counterpart), _convert_reasoning_value_to_parts should base64.b64decode(signature) when the source is Gemini. Left out of this PR to
keep the diff surgical — happy to address as a follow-up once you confirm the desired semantics.


Unit tests

Append to tests/unittests/models/test_litellm.py:

  def test_extract_reasoning_value_gemini_thinking_blocks_zips_signatures():
    """Gemini emits thinking_blocks without per-block signatures; signatures
    arrive in parallel under provider_specific_fields.thought_signatures.
    _extract_reasoning_value should normalize them into the Anthropic shape."""
    message = {
        "role": "assistant",
        "content": "I am a large language model.",
        "thinking_blocks": [
            {"type": "thinking", "thinking": "Step one ..."},
            {"type": "thinking", "thinking": "Step two ..."},
        ],
        "provider_specific_fields": {
            "thought_signatures": ["sig-1", "sig-2"],
        },
    }
    result = _extract_reasoning_value(message)
    assert result == [
        {"type": "thinking", "thinking": "Step one ...", "signature": "sig-1"},
        {"type": "thinking", "thinking": "Step two ...", "signature": "sig-2"},
    ]


  def test_extract_reasoning_value_gemini_thinking_blocks_without_signatures():
    """If provider_specific_fields is absent, Gemini thinking_blocks pass
    through unchanged. Downstream detector should still accept them once
    broadened — covered separately."""
    message = {
        "role": "assistant",
        "content": "Answer",
        "thinking_blocks": [
            {"type": "thinking", "thinking": "Inner monologue"},
        ],
    }
    result = _extract_reasoning_value(message)
    assert result == [{"type": "thinking", "thinking": "Inner monologue"}]


  def test_extract_reasoning_value_anthropic_thinking_blocks_unchanged():
    """Regression guard: Anthropic-shaped blocks (already carrying signature)
    must not be re-zipped or otherwise modified."""
    blocks = [
        {"type": "thinking", "thinking": "Anthropic thought", "signature": "abc"},
    ]
    message = {
        "role": "assistant",
        "content": "Answer",
        "thinking_blocks": blocks,
        "provider_specific_fields": {"thought_signatures": ["should-be-ignored"]},
    }
    result = _extract_reasoning_value(message)
    assert result == blocks
  

  def test_message_to_generate_content_response_gemini_thinking_blocks():
    """End-to-end: a Gemini-shaped message should surface a thought Part and
    the visible text Part, with the thought signature attached as bytes."""
    message = {
        "role": "assistant",
        "content": "I am a large language model.",
        "thinking_blocks": [
            {"type": "thinking", "thinking": "Identity check ..."},
        ],
        "provider_specific_fields": {
            "thought_signatures": ["AY89a1/RGkc"],
        },
    }
    response = _message_to_generate_content_response(message)
    assert len(response.content.parts) == 2
    thought_part = response.content.parts[0]
    text_part = response.content.parts[1]
    assert thought_part.thought is True
    assert thought_part.text == "Identity check ..."
    assert thought_part.thought_signature == b"AY89a1/RGkc"
    assert text_part.text == "I am a large language model."

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions