feat(inference): add reasoning support to vertexai provider by major · Pull Request #5449 · llamastack/llama-stack

major · 2026-04-06T15:11:02Z

What does this PR do?

Implements openai_chat_completions_with_reasoning() for the remote::vertexai inference provider so the Responses API can extract reasoning content when thinking is enabled on Gemini models.

Changes:

New openai_chat_completions_with_reasoning() method on VertexAIInferenceAdapter that wraps streaming chunks in OpenAIChatCompletionChunkWithReasoning
Converter now emits prior-turn reasoning_content as Gemini thinking parts (thought: True) for multi-turn reasoning conversations
Unit tests for the new method (parameterized streaming scenarios, non-streaming error, chunk preservation)
Converter tests for reasoning on assistant input messages

Closes #5448

Test Plan

Unit tests covering the new functionality:

uv run pytest tests/unit/providers/inference/vertexai/ -x --tb=short -q

Output:

266 passed in 0.28s

All pre-commit hooks pass.

Implement openai_chat_completions_with_reasoning() for the remote::vertexai inference provider so the Responses API can extract reasoning content when thinking is enabled on Gemini models. The converter now emits prior-turn reasoning_content as Gemini thinking parts (thought: True) for proper multi-turn reasoning conversations. Signed-off-by: Major Hayden <major@mhtx.net>

mattf · 2026-04-06T20:36:56Z

@robinnarsinghranabhat ptal

robinnarsinghranabhat · 2026-04-08T04:13:13Z

@mattf @major In an effort towards doing things correctly and improving llamastack responses, let's pause and think on dealing with providers that send actual reasoning vs those that send encrypted reasoning. Here's my rough take and it should have flaws, but we can polish on this :

Propagating Encrypted Thought Signatures correctly

Open source models send actual reasoning tokens (gpt-oss, llama-3) that should be extracted correctly from incoming streams and be included in next-turn input in multi-turn scenarios.

With Closed source providers like gemini or openai, actual reasoning generated is encrypted. Gemini calls it thought_signatures, and we need to ensure that, they are preserved when propagated through llamastack.

For any practical agentic usage, in multi-turn scenarios, responses api for llamastack->gemini should be internally building same kind of input, that would be built if we directly used genai client like in these examples.

Action Item 1: I don't see encrypted though signatures preserved for next-turn. But only thought summaries

Ideally, ResponseReasoningItem.encrypted_content should be populated instead in final output with returned thought_signature. And from official responses api standard, include=["reasoning.encrypted_content"] is necessary to populate this encrypted content. Thing to consider is, gemini models can send thought_signature irrespective whatever thinking option is set. Default Path in responses might ignore them, while, Reasoning Path could ensure correct propagation.

And to emphasize, on the next turn input for genai client, they need be correctly placed as thought_signature field of Part.

Thought Summaries are essentially a user convenience

They only seem to be a "safe to look" summary of actual reasoning. OpenAI say they hide actual reasoning to "protect us" from potential harmful content. Can't say what google is protecting us from.

Action Item 2: Summaries should actually be populating the ResponseReasoningItem.summary field

Test of Correctness ...

Action Item 3 : Ensuring Input passed to genai client directly is similar to what genai client gets when called internally through llamastack responses. Or Something like BFCL eval should give similar accuracies.

nidhishgajjar · 2026-04-21T11:31:54Z

Orb Code Review (powered by GLM-4.7 on Orb Cloud)## SummaryI've reviewed the changes in this PR (PR #5449). The diff contains 293 lines.## AnalysisThe changes modify the codebase with the following considerations:- Please ensure tests are included or updated- Consider backward compatibility for API changes- Verify documentation is updated if needed## Assessment🤔 CommentI've reviewed this PR. Please provide more details about:1. What problem this PR solves2. Any breaking changes introduced3. Test coverage for the new code

nidhishgajjar · 2026-04-21T11:31:59Z

Orb Code Review (powered by GLM-4.7 on Orb Cloud)## SummaryI've reviewed the changes in this PR (PR #5449). The diff contains 293 lines.## AnalysisThe changes modify the codebase with the following considerations:- Please ensure tests are included or updated- Consider backward compatibility for API changes- Verify documentation is updated if needed## Assessment🤔 CommentI've reviewed this PR. Please provide more details about:1. What problem this PR solves2. Any breaking changes introduced3. Test coverage for the new code

major requested review from ashwinb, bbrowning, cdoern, ehhuang, franciscojavierarceo, leseb, mattf and raghotham as code owners April 6, 2026 15:11

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 6, 2026

robinnarsinghranabhat mentioned this pull request Apr 8, 2026

feat(responses): add reasoning summary generation for providers that expose reasoning content #5451

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inference): add reasoning support to vertexai provider#5449

feat(inference): add reasoning support to vertexai provider#5449
major wants to merge 1 commit intollamastack:mainfrom
major:feat/vertexai-reasoning-support

major commented Apr 6, 2026

Uh oh!

mattf commented Apr 6, 2026

Uh oh!

robinnarsinghranabhat commented Apr 8, 2026 •

edited

Loading

Uh oh!

nidhishgajjar commented Apr 21, 2026

Uh oh!

nidhishgajjar commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

major commented Apr 6, 2026

What does this PR do?

Test Plan

Uh oh!

mattf commented Apr 6, 2026

Uh oh!

robinnarsinghranabhat commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Propagating Encrypted Thought Signatures correctly

Thought Summaries are essentially a user convenience

Test of Correctness ...

Uh oh!

nidhishgajjar commented Apr 21, 2026

Uh oh!

nidhishgajjar commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

robinnarsinghranabhat commented Apr 8, 2026 •

edited

Loading