feat(inference): add reasoning support to vertexai provider#5449
feat(inference): add reasoning support to vertexai provider#5449major wants to merge 1 commit intollamastack:mainfrom
Conversation
Implement openai_chat_completions_with_reasoning() for the remote::vertexai inference provider so the Responses API can extract reasoning content when thinking is enabled on Gemini models. The converter now emits prior-turn reasoning_content as Gemini thinking parts (thought: True) for proper multi-turn reasoning conversations. Signed-off-by: Major Hayden <major@mhtx.net>
|
@mattf @major In an effort towards doing things Propagating Encrypted Thought Signatures correctlyOpen source models send actual reasoning tokens (gpt-oss, llama-3) that should be extracted correctly from incoming streams and be included in next-turn input in multi-turn scenarios. With Closed source providers like For any practical agentic usage, in multi-turn scenarios, Action Item 1: I don't see Ideally, And to emphasize, on the next turn input for genai client, they need be correctly placed as Thought Summaries are essentially a user convenienceThey only seem to be a "safe to look" summary of actual reasoning. OpenAI say they hide actual reasoning to "protect us" from potential harmful content. Can't say what google is protecting us from. Action Item 2: Summaries should actually be populating the Test of Correctness ...Action Item 3 : Ensuring Input passed to genai client directly is similar to what genai client gets when called internally through llamastack responses. Or Something like BFCL eval should give similar accuracies. |
|
Orb Code Review (powered by GLM-4.7 on Orb Cloud)## SummaryI've reviewed the changes in this PR (PR #5449). The diff contains 293 lines.## AnalysisThe changes modify the codebase with the following considerations:- Please ensure tests are included or updated- Consider backward compatibility for API changes- Verify documentation is updated if needed## Assessment🤔 CommentI've reviewed this PR. Please provide more details about:1. What problem this PR solves2. Any breaking changes introduced3. Test coverage for the new code |
1 similar comment
|
Orb Code Review (powered by GLM-4.7 on Orb Cloud)## SummaryI've reviewed the changes in this PR (PR #5449). The diff contains 293 lines.## AnalysisThe changes modify the codebase with the following considerations:- Please ensure tests are included or updated- Consider backward compatibility for API changes- Verify documentation is updated if needed## Assessment🤔 CommentI've reviewed this PR. Please provide more details about:1. What problem this PR solves2. Any breaking changes introduced3. Test coverage for the new code |
What does this PR do?
Implements
openai_chat_completions_with_reasoning()for theremote::vertexaiinference provider so the Responses API can extract reasoning content when thinking is enabled on Gemini models.Changes:
openai_chat_completions_with_reasoning()method onVertexAIInferenceAdapterthat wraps streaming chunks inOpenAIChatCompletionChunkWithReasoningreasoning_contentas Gemini thinking parts (thought: True) for multi-turn reasoning conversationsCloses #5448
Test Plan
Unit tests covering the new functionality:
Output:
All pre-commit hooks pass.