Fix(workflow)/ add _process_content_object function to extract output from event.content before assigning it to child.output in _reconstruct_node_states #5668
Open
samarth1224 wants to merge 3 commits into
Conversation
…tract output from event.content object before assigning it to child.output, in _reconstruct_node_states and add unittest for _process_content_object and modify exisiting test test_scan_message_as_output
Collaborator
|
Hi @samarth1224 , Thank you for your contribution! Can you please fix the failing tests |
7bc9476 to
61c0d14
Compare
Author
|
Hi @rohityan , To resolve this without renaming the classes—which could have been disruptive—I’ve explicitly assigned test = False to them. This tells Pytest to skip them during the collection phase. Thanks |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Link to Issue
1. Link to an existing issue (if applicable):
NOTE
The reason for opening this PR is to provide an alternative approach to the one proposed in the PR #5587 I have compared both the approaches at the end based on my knowledge of codebase.
I have read the contributing guide, and I understand opening PR for the same issue is not advised, but I opened it to present a different solution, which could be taken in consideration.
Problem:
During fresh execution of a node with
message_as_output == True,process_llm_agent_outputextracts text from theContentparts, parses it, validates it against the schema, and assigns the resulting output toevent.output.However, before the event is persisted to the session,
_consume_event_queueoptimizes formessage_as_outputnodes by strippingevent.output(setting it toNone) and only saving the rawevent.contentUpon workflow resumption,
_reconstruct_node_statesrebuilds the node's state. Becauseevent.outputisNone, it fell back to assigning the rawevent.content(aContentobject) directly tochild.output.Solution:
Added
_process_content_object()to extract the output from the rawevent.contentduring rehydration.This function mirrors the logic used in the
process_llm_agent_output function in_llm_agent_wrapper.py` fileContentobject.Parts.thoughtparts to prevent Chain-of-Thought reasoning text from leaking into the final output.Testing Plan
I have added new unit tests for the
_process_content_objectfunction. These tests cover various scenarios, including plain text extraction, JSON parsing for structured outputs, and the correct filtering of thought parts to prevent internal reasoning from leaking into the final output.Additionally, I have updated the existing rehydration test
_test_scan_message_as_outputthat previously asserted the rawContentobject was returned; It now correctly verify that the output is reconstructed during node state rehydration.Unit Tests:
Summary Unit Test
Manual End-to-End (E2E) Tests:
I have already provided the necessary setup and instruction in the description of Issue #5553, along with minimal reproducible code.
The same setup can be used as Manual E2E test.
I am presentig the minimal reprouducible code and successful mitigation of the bug in the screenshot.
Output
Checklist
Comparison of Both the approaches.
PR #5587
_consume_event_queuein runner, which I feel defeats the purpose ofmessage_as_outputattribute ofNodeInfoclass, which clearly states, that there is no need to preserve output ifevent.contentis present.As stated in definition of NodeInfo class:
Therefore, I feel output is not needed to be preserved.
message_as_outputisTrueorFalse, in the_reconstruct_node_stateand_track_event_in_context, effectively becomes redundant.This PR
_process_llm_agent_outputfunction in_llm_agent_wrapper.py._reconstruct_node_statesall the way from the dynamic node scheduler.Conclusion:
This PR addresses a bug where
message_as_outputnodes fail to rehydrate the correct output, proposing a solution to processevent.contentduring resumption rather than persisting it. This approach adheres to the design ofmessage_as_outputby extracting, filtering, and parsing the content upon rehydration, offering an alternative to PR #5587.