fix: clear DOM cache after scroll to prevent stale data in extract#4632
Closed
wangshisan wants to merge 3 commits intobrowser-use:mainfrom
Closed
fix: clear DOM cache after scroll to prevent stale data in extract#4632wangshisan wants to merge 3 commits intobrowser-use:mainfrom
wangshisan wants to merge 3 commits intobrowser-use:mainfrom
Conversation
After a scroll action, the enhanced_dom_tree cache was not being invalidated, causing subsequent extract() calls to return stale DOM data from before the scroll. This led to agents seeing duplicate/unchanged content and incorrectly concluding that no new data was available after scrolling. Fix: call dom_watchdog.clear_cache() on successful scroll.
c3838b1 to
d5f6eb6
Compare
Member
|
Thanks for looking into this! This patches it at the tools.scroll() call site, which is just one entry point. Scroll also fires through the MCP server path and potentially other callers via ScrollEvent. Closing in favor of #4658, which fixes this at the event layer, but I appreciate your work on this. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #4631
After a
scrollaction, theenhanced_dom_treecache inDOMWatchdogwas not being invalidated. This caused subsequentextract()calls to return stale DOM data from before the scroll, leading agents to see duplicate/unchanged content and incorrectly conclude that no new data was available after scrolling.Problem
The agent then sees "duplicate data" → thinks "cannot get more" → may give up on pagination.
Root Cause
The
scrollaction intools/service.pynever calleddom_watchdog.clear_cache().Other operations that correctly clear the cache:
read_file(source=page) — service.py:1736AgentFocusChangedEvent) — session.py:1141The cache read path at
markdown_extractor.py:119-125returns cached value without any staleness check.Fix
Add
dom_watchdog.clear_cache()call after successful scroll completion:Changes
browser_use/tools/service.py: Added 3 lines —clear_cache()call after successful scroll returnVerification
ruff checkpassesruff formatpassespyrightpasses (0 errors, 0 warnings)Summary by cubic
Clear the
DOMWatchdogenhanced_dom_treecache after a successfulscrollso the nextextract()reads fresh DOM. This prevents stale or duplicate content after pagination.browser_use/tools/service.py, callbrowser_session._dom_watchdog.clear_cache()on successfulscroll.extract()doesn’t return pre-scroll cached DOM.Written for commit 4acd164. Summary will update on new commits.