Skip to content

fix: clear DOM cache after scroll to prevent stale data in extract#4632

Closed
wangshisan wants to merge 3 commits intobrowser-use:mainfrom
wangshisan:fix/scroll-clear-dom-cache
Closed

fix: clear DOM cache after scroll to prevent stale data in extract#4632
wangshisan wants to merge 3 commits intobrowser-use:mainfrom
wangshisan:fix/scroll-clear-dom-cache

Conversation

@wangshisan
Copy link
Copy Markdown

@wangshisan wangshisan commented Apr 7, 2026

Summary

Fixes #4631

After a scroll action, the enhanced_dom_tree cache in DOMWatchdog was not being invalidated. This caused subsequent extract() calls to return stale DOM data from before the scroll, leading agents to see duplicate/unchanged content and incorrectly conclude that no new data was available after scrolling.

Problem

Step 1: extract() → builds DOM → caches as enhanced_dom_tree ("18 posts")
Step 2: scroll(down=1) → browser scrolls, new content loads ✅
         → BUT cache NOT cleared! ❌
Step 3: extract() → returns cached OLD data ("18 posts") ← BUG

The agent then sees "duplicate data" → thinks "cannot get more" → may give up on pagination.

Root Cause

The scroll action in tools/service.py never called dom_watchdog.clear_cache().

Other operations that correctly clear the cache:

The cache read path at markdown_extractor.py:119-125 returns cached value without any staleness check.

Fix

Add dom_watchdog.clear_cache() call after successful scroll completion:

# After successful scroll, clear DOM cache to ensure next extract gets fresh content
if browser_session._dom_watchdog:
    browser_session._dom_watchdog.clear_cache()

Changes

Verification

  • ruff check passes
  • ruff format passes
  • pyright passes (0 errors, 0 warnings)

Summary by cubic

Clear the DOMWatchdog enhanced_dom_tree cache after a successful scroll so the next extract() reads fresh DOM. This prevents stale or duplicate content after pagination.

  • Bug Fixes
    • In browser_use/tools/service.py, call browser_session._dom_watchdog.clear_cache() on successful scroll.
    • Ensures extract() doesn’t return pre-scroll cached DOM.

Written for commit 4acd164. Summary will update on new commits.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 7, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

After a scroll action, the enhanced_dom_tree cache was not being
invalidated, causing subsequent extract() calls to return stale
DOM data from before the scroll. This led to agents seeing
duplicate/unchanged content and incorrectly concluding that no
new data was available after scrolling.

Fix: call dom_watchdog.clear_cache() on successful scroll.
@wangshisan wangshisan force-pushed the fix/scroll-clear-dom-cache branch from c3838b1 to d5f6eb6 Compare April 7, 2026 13:53
@laithrw
Copy link
Copy Markdown
Member

laithrw commented Apr 11, 2026

Thanks for looking into this! This patches it at the tools.scroll() call site, which is just one entry point. Scroll also fires through the MCP server path and potentially other callers via ScrollEvent. Closing in favor of #4658, which fixes this at the event layer, but I appreciate your work on this.

@laithrw laithrw closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: scroll action does not invalidate enhanced_dom_tree cache, causing extract() to return stale data

3 participants