fix(semantic-memory): break ingestion retry loop on <think>-prefixed LLM responses by sscargal · Pull Request #1327 · MemMachine/MemMachine

sscargal · 2026-04-15T19:30:06Z

Purpose of the change

Qwen3.5 (and other reasoning-mode models) prefix responses with ... blocks. The OpenAI Responses client returned this text verbatim, pydantic rejected it in llm_feature_update, and the generic except branch in _process_single_set re-queued the message on every 2-second ingestion tick — burning LLM tokens indefinitely for any user on a reasoning-mode model.

Description

Fix A (packages/server/src/memmachine_server/common/language_model/openai_responses_language_model.py): strip leading ... blocks at the LLM client boundary. Applied at both output_text return sites in _generate_response; added a fallback in generate_parsed_response that strips and parses via json_repair + TypeAdapter when the SDK's output_parsed is None.

Fix B (packages/server/src/memmachine_server/semantic_memory/semantic_ingestion.py): catch pydantic.ValidationError in process_semantic_type and mark the message uid as ingested, mirroring the existing context_length_exceeded skip path. Narrow to ValidationError only — architect ruled out a broad (ValueError, TypeError) catch because those leak from the embedder and storage call sites. Generic except Exception and the _debug_fail_loudly re-raise are preserved so unknown errors stay loud.

Fix A closes the immediate trigger; Fix B is defense-in-depth against the next permanent-failure shape (provider schema drift, truncated responses) that will surface the same way.

Tests:

test_openai_responses_language_model.py: 3 new cases (leading , multi-line , output_parsed=None fallback).
test_semantic_ingestion.py: ValidationError marks message as ingested, and a guard test confirming generic Exception does NOT mark it — defends against future over-broad widening of the catch.

Fixes/Closes

Fixes #1270

Type of change

Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

Unit Test

Checklist

I have signed the commit(s) within this pull request
My code follows the style guidelines of this project (See STYLE_GUIDE.md)
I have performed a self-review of my own code
I have commented my code
My changes generate no new warnings
I have added unit tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have checked my code and corrected any misspellings

Maintainer Checklist

Confirmed all checks passed
Contributor has signed the commit(s)
Reviewed the code
Run, Tested, and Verified the change(s) work as expected

Screenshots/Gifs

N/A

Further comments

None

…LLM responses Qwen3.5 (and other reasoning-mode models) prefix responses with <think>...</think> blocks. The OpenAI Responses client returned this text verbatim, pydantic rejected it in llm_feature_update, and the generic except branch in _process_single_set re-queued the message on every 2-second ingestion tick — burning LLM tokens indefinitely for any user on a reasoning-mode model. Fix A (packages/server/src/memmachine_server/common/language_model/openai_responses_language_model.py): strip leading <think>...</think> blocks at the LLM client boundary. Applied at both output_text return sites in _generate_response; added a fallback in generate_parsed_response that strips and parses via json_repair + TypeAdapter when the SDK's output_parsed is None. Fix B (packages/server/src/memmachine_server/semantic_memory/semantic_ingestion.py): catch pydantic.ValidationError in process_semantic_type and mark the message uid as ingested, mirroring the existing context_length_exceeded skip path. Narrow to ValidationError only — architect ruled out a broad (ValueError, TypeError) catch because those leak from the embedder and storage call sites. Generic except Exception and the _debug_fail_loudly re-raise are preserved so unknown errors stay loud. Fix A closes the immediate trigger; Fix B is defense-in-depth against the next permanent-failure shape (provider schema drift, truncated responses) that will surface the same way. Tests: - test_openai_responses_language_model.py: 3 new cases (leading <think>, multi-line <think>, output_parsed=None fallback). - test_semantic_ingestion.py: ValidationError marks message as ingested, and a guard test confirming generic Exception does NOT mark it — defends against future over-broad widening of the catch. Refs: MemMachine#1270 Signed-off-by: Steve Scargall <37674041+sscargal@users.noreply.github.com>

Copilot

Pull request overview

Fixes an ingestion retry-loop in semantic memory ingestion caused by reasoning-mode LLM outputs that include <think>...</think> blocks, which previously led to repeated parsing failures and re-queueing.

Changes:

Strip <think>...</think> blocks from OpenAI Responses output_text, and add a fallback parse path when output_parsed is None.
Treat pydantic.ValidationError during semantic ingestion as non-retryable by marking the message as ingested (preventing infinite re-queues).
Add unit tests covering <think> stripping, parsed-response fallback parsing, and the new ValidationError ingestion behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
packages/server/src/memmachine_server/semantic_memory/semantic_ingestion.py	Marks messages as ingested on `ValidationError` to avoid permanent-failure retry loops.
packages/server/src/memmachine_server/common/language_model/openai_responses_language_model.py	Strips `<think>` blocks from responses and adds fallback structured parsing when SDK parsing doesn’t produce `output_parsed`.
packages/server/server_tests/memmachine_server/semantic_memory/test_semantic_ingestion.py	Adds tests ensuring `ValidationError` marks ingested and generic exceptions do not.
packages/server/server_tests/memmachine_server/common/language_model/test_openai_responses_language_model.py	Adds tests for `<think>` stripping and `output_parsed=None` fallback parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Steve Scargall <37674041+sscargal@users.noreply.github.com>

edwinyyyu

What if I want the output to have a <think></think> block on purpose? What about another model that uses e.g. <reason></reason> blocks instead?

And would this apply to that model no matter which API it's going through? If so, I think some generalized optional sanitization utility would be preferable, that has many checkbox-like options for which sanitization operations to apply.

malatewang · 2026-04-30T18:53:38Z


 logger = logging.getLogger(__name__)

+_THINK_BLOCK_RE = re.compile(r"<think>.*?</think>", re.DOTALL)


I think the response API is only supported by openAI, which does not return the "think" block.

I think the issue is using third-party models with the OpenAI API. OpenAI does a good job of avoiding this problem.

By baking the solution into the API wrapper instead of a sanitization layer, it can ruin intended outputs from a model that follows instructions well, and we have to play whack-a-mole on every single API which a model that does not follow instructions well may use.

sscargal requested review from Copilot, edwinyyyu and malatewang April 15, 2026 19:30

sscargal mentioned this pull request Apr 15, 2026

[Bug]: Dead loop to trigger memmachine_server.semantic_memory.semantic_ingestion when LLM reponse is invalid #1270

Open

Copilot started reviewing on behalf of sscargal April 15, 2026 19:31 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread packages/server/src/memmachine_server/common/language_model/openai_responses_language_model.py

Comment thread packages/server/src/memmachine_server/semantic_memory/semantic_ingestion.py

sscargal added 2 commits April 15, 2026 20:38

Fix lint issues

4c53003

Signed-off-by: Steve Scargall <37674041+sscargal@users.noreply.github.com>

Fixes ruff format --check --diff error

fcc8cc0

Signed-off-by: Steve Scargall <37674041+sscargal@users.noreply.github.com>

junttang mentioned this pull request Apr 17, 2026

[Feat]: Add retry logic for LLM judge and semantic memory consolidation on structured output failures #1332

Closed

edwinyyyu reviewed Apr 23, 2026

View reviewed changes

malatewang reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(semantic-memory): break ingestion retry loop on <think>-prefixed LLM responses#1327

fix(semantic-memory): break ingestion retry loop on <think>-prefixed LLM responses#1327
sscargal wants to merge 3 commits into
MemMachine:mainfrom
sscargal:bugfix/1270-semantic-ingestion-infinite-retry-loop

sscargal commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

edwinyyyu left a comment •

edited

Loading

Uh oh!

malatewang Apr 30, 2026

Uh oh!

edwinyyyu Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		logger = logging.getLogger(__name__)

		_THINK_BLOCK_RE = re.compile(r"<think>.*?</think>", re.DOTALL)

Conversation

sscargal commented Apr 15, 2026

Purpose of the change

Description

Fixes/Closes

Type of change

How Has This Been Tested?

Checklist

Maintainer Checklist

Screenshots/Gifs

Further comments

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

edwinyyyu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

malatewang Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

edwinyyyu Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

edwinyyyu left a comment •

edited

Loading