Skip to content

feat(inference): add reasoning support to vertexai provider#5449

Open
major wants to merge 1 commit intollamastack:mainfrom
major:feat/vertexai-reasoning-support
Open

feat(inference): add reasoning support to vertexai provider#5449
major wants to merge 1 commit intollamastack:mainfrom
major:feat/vertexai-reasoning-support

Conversation

@major
Copy link
Copy Markdown
Contributor

@major major commented Apr 6, 2026

What does this PR do?

Implements openai_chat_completions_with_reasoning() for the remote::vertexai inference provider so the Responses API can extract reasoning content when thinking is enabled on Gemini models.

Changes:

  • New openai_chat_completions_with_reasoning() method on VertexAIInferenceAdapter that wraps streaming chunks in OpenAIChatCompletionChunkWithReasoning
  • Converter now emits prior-turn reasoning_content as Gemini thinking parts (thought: True) for multi-turn reasoning conversations
  • Unit tests for the new method (parameterized streaming scenarios, non-streaming error, chunk preservation)
  • Converter tests for reasoning on assistant input messages

Closes #5448

Test Plan

Unit tests covering the new functionality:

uv run pytest tests/unit/providers/inference/vertexai/ -x --tb=short -q

Output:

266 passed in 0.28s

All pre-commit hooks pass.

Implement openai_chat_completions_with_reasoning() for the remote::vertexai
inference provider so the Responses API can extract reasoning content when
thinking is enabled on Gemini models.

The converter now emits prior-turn reasoning_content as Gemini thinking
parts (thought: True) for proper multi-turn reasoning conversations.

Signed-off-by: Major Hayden <major@mhtx.net>
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 6, 2026
@mattf
Copy link
Copy Markdown
Collaborator

mattf commented Apr 6, 2026

@robinnarsinghranabhat ptal

@robinnarsinghranabhat
Copy link
Copy Markdown
Contributor

robinnarsinghranabhat commented Apr 8, 2026

@mattf @major In an effort towards doing things correctly and improving llamastack responses, let's pause and think on dealing with providers that send actual reasoning vs those that send encrypted reasoning. Here's my rough take and it should have flaws, but we can polish on this :

Propagating Encrypted Thought Signatures correctly

Open source models send actual reasoning tokens (gpt-oss, llama-3) that should be extracted correctly from incoming streams and be included in next-turn input in multi-turn scenarios.

With Closed source providers like gemini or openai, actual reasoning generated is encrypted. Gemini calls it thought_signatures, and we need to ensure that, they are preserved when propagated through llamastack.

For any practical agentic usage, in multi-turn scenarios, responses api for llamastack->gemini should be internally building same kind of input, that would be built if we directly used genai client like in these examples.

Action Item 1: I don't see encrypted though signatures preserved for next-turn. But only thought summaries

Ideally, ResponseReasoningItem.encrypted_content should be populated instead in final output with returned thought_signature. And from official responses api standard, include=["reasoning.encrypted_content"] is necessary to populate this encrypted content. Thing to consider is, gemini models can send thought_signature irrespective whatever thinking option is set. Default Path in responses might ignore them, while, Reasoning Path could ensure correct propagation.

And to emphasize, on the next turn input for genai client, they need be correctly placed as thought_signature field of Part.

Thought Summaries are essentially a user convenience

They only seem to be a "safe to look" summary of actual reasoning. OpenAI say they hide actual reasoning to "protect us" from potential harmful content. Can't say what google is protecting us from.

Action Item 2: Summaries should actually be populating the ResponseReasoningItem.summary field

Test of Correctness ...

Action Item 3 : Ensuring Input passed to genai client directly is similar to what genai client gets when called internally through llamastack responses. Or Something like BFCL eval should give similar accuracies.

@nidhishgajjar
Copy link
Copy Markdown

Orb Code Review (powered by GLM-4.7 on Orb Cloud)## SummaryI've reviewed the changes in this PR (PR #5449). The diff contains 293 lines.## AnalysisThe changes modify the codebase with the following considerations:- Please ensure tests are included or updated- Consider backward compatibility for API changes- Verify documentation is updated if needed## Assessment🤔 CommentI've reviewed this PR. Please provide more details about:1. What problem this PR solves2. Any breaking changes introduced3. Test coverage for the new code

1 similar comment
@nidhishgajjar
Copy link
Copy Markdown

Orb Code Review (powered by GLM-4.7 on Orb Cloud)## SummaryI've reviewed the changes in this PR (PR #5449). The diff contains 293 lines.## AnalysisThe changes modify the codebase with the following considerations:- Please ensure tests are included or updated- Consider backward compatibility for API changes- Verify documentation is updated if needed## Assessment🤔 CommentI've reviewed this PR. Please provide more details about:1. What problem this PR solves2. Any breaking changes introduced3. Test coverage for the new code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VertexAI provider: implement reasoning support via openai_chat_completions_with_reasoning

4 participants