Skip to content

fix: bound QuestionHistory size to prevent LAN-driven OOM#1733

Merged
bdraco merged 5 commits into
python-zeroconf:masterfrom
bluetoothbot:koan/fix-issue-1723
May 20, 2026
Merged

fix: bound QuestionHistory size to prevent LAN-driven OOM#1733
bdraco merged 5 commits into
python-zeroconf:masterfrom
bluetoothbot:koan/fix-issue-1723

Conversation

@bluetoothbot
Copy link
Copy Markdown
Contributor

@bluetoothbot bluetoothbot commented May 18, 2026

Summary

QuestionHistory._history was unbounded between the periodic 10s async_expire ticks driven by _CACHE_CLEANUP_INTERVAL. A LAN peer streaming distinct questions could grow the dict (and the retained known_answers sets) for ~10s before the next cleanup, easily reaching hundreds of MB at line-rate.

Cap _history at a configurable upper bound, with opportunistic expiry on insert and oldest-first FIFO eviction as a fallback. Mirrors the _MAX_CACHE_RECORDS pattern from #1718.

Fixes #1723

Changes

  • Add _MAX_QUESTION_HISTORY_ENTRIES = 10000 to const.py.
  • In QuestionHistory.add_question_at_time, when the cap would be exceeded by a new entry, first call async_expire(now) to drop entries past _DUPLICATE_QUESTION_INTERVAL; if still full, evict the oldest insertion (Python dict preserves insertion order) until there is room.
  • Update _history.pxd to declare the new constant for the Cython hot path.

Test plan

  • New tests/test_history.py cases:
    • test_question_history_bounded — flooding past the cap keeps _history size ≤ cap.
    • test_question_history_evicts_oldest_first — earliest-inserted entry is the first dropped at cap.
    • test_question_history_opportunistic_expire — stale entries are reclaimed before fresh ones are evicted.
  • Existing suppression + expiry tests still pass.
  • SKIP_CYTHON=1 poetry run pytest --timeout=60 tests → 361 passed, 2 skipped.
  • poetry run cython -3 src/zeroconf/_history.py compiles cleanly with the updated .pxd.

Generated by Kōan /fix


Quality Report

Changes: 4 files changed, 70 insertions(+), 1 deletion(-)

Code scan: clean

Tests: passed (4 PASSED)

Branch hygiene: clean

Generated by Kōan post-mission quality pipeline

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.76%. Comparing base (1d83550) to head (7eec3e8).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1733   +/-   ##
=======================================
  Coverage   99.76%   99.76%           
=======================================
  Files          33       33           
  Lines        3444     3455   +11     
  Branches      475      478    +3     
=======================================
+ Hits         3436     3447   +11     
  Misses          5        5           
  Partials        3        3           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 18, 2026

Merging this PR will not alter performance

✅ 11 untouched benchmarks


Comparing bluetoothbot:koan/fix-issue-1723 (7eec3e8) with master (1d83550)1

Open in CodSpeed

Footnotes

  1. No successful run was found on master (69bbcd1) during the generation of this report, so 1d83550 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 19, 2026

You can't test _history directly because its not public in cython

=========================== short test summary info ============================
FAILED tests/test_history.py::test_question_history_bounded - AttributeError: 'zeroconf._history.QuestionHistory' object has no attribute '_history'
FAILED tests/test_history.py::test_question_history_evicts_oldest_first - AttributeError: 'zeroconf._history.QuestionHistory' object has no attribute '_history'
FAILED tests/test_history.py::test_question_history_opportunistic_expire - AttributeError: 'zeroconf._history.QuestionHistory' object has no attribute '_history'
================== 3 failed, 358 passed, 2 skipped in 57.13s ===================

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 19, 2026

@bluetoothbot rr

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR mitigates a LAN-triggerable OOM vector by bounding the growth of QuestionHistory._history between the engine’s periodic cache cleanup ticks, preventing unbounded retention of distinct questions (and associated known-answer sets) during sustained query floods.

Changes:

  • Introduces _MAX_QUESTION_HISTORY_ENTRIES in const.py to cap QuestionHistory size.
  • Adds opportunistic expiry on insert and oldest-first eviction in QuestionHistory.add_question_at_time when the cap is reached.
  • Updates the Cython .pxd and adds focused unit tests validating cap behavior, FIFO eviction, and opportunistic expiry.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/zeroconf/const.py Adds _MAX_QUESTION_HISTORY_ENTRIES constant to configure the hard cap.
src/zeroconf/_history.py Implements opportunistic expiry + oldest-first eviction when inserting past the cap.
src/zeroconf/_history.pxd Declares the new constant for the Cython-compiled hot path.
tests/test_history.py Adds tests covering bounded growth, FIFO eviction, and opportunistic expiry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_history.py Outdated
@bluetoothbot
Copy link
Copy Markdown
Contributor Author

bluetoothbot commented May 19, 2026

PR Review — fix: bound QuestionHistory size to prevent LAN-driven OOM

Solid security fix that mirrors the _MAX_CACHE_RECORDS pattern from #1718. The implementation is correct: insertion-ordered dict gives O(1) oldest-key access, the gate (question not in self._history and len >= cap) compiles to two C-level checks via the updated .pxd, opportunistic async_expire is correctly gated on the oldest entry's age (avoiding @Copilot's CPU-amplification concern from id=3270350098 area), and the FIFO fallback guarantees forward progress. The author's reply on id=3270294567 walking through the Cython codegen is accurate.

Non-blocking suggestions: (1) follow @bdraco's id=3270350098 — extract the slow-path body into a cdef helper to match the _cache.py::_async_evict_oldest() precedent; (2) consider cdef readonly over cdef public for _history to narrow the Python-visible surface to a getter; (3) add a suppresses()-based behavioural assertion to one of the new tests so the cap's user-visible consequence is pinned.

No blocking issues. The _history access from tests is now functional thanks to the public flip in the .pxd, addressing the test failures in id=4492383590.


🟢 Suggestions

1. Consider extracting eviction body into a cdef helper (`src/zeroconf/_history.py`, L43-53)

@bdraco's suggestion (id=3270350098) to extract the inner body into a separate cdef method aligns with the existing precedent in _cache.py, where DNSCache._async_evict_oldest() is split out from the main add path (see _cache.py:99-100). Two benefits:

  1. Hot-path leannessadd_question_at_time stays a tight, branchless-on-the-common-case method. The gate (question not in self._history and len(self._history) >= _MAX_QUESTION_HISTORY_ENTRIES) compiles to two C-level checks; on a miss the call to the helper is the only extra cost on the slow path, and it stays uninlined.
  2. Parity_cache.py already follows this shape for the same defense-in-depth pattern. Matching it makes the codebase consistent.

Suggested shape (in _history.pxd, alongside the existing cpdefs):

cdef void _evict_to_make_room(self, double now)

In _history.py:

def _evict_to_make_room(self, now: _float) -> None:
    oldest = next(iter(self._history))
    oldest_than, _ = self._history[oldest]
    if now - oldest_than > _DUPLICATE_QUESTION_INTERVAL:
        self.async_expire(now)
    while len(self._history) >= _MAX_QUESTION_HISTORY_ENTRIES:
        del self._history[next(iter(self._history))]

def add_question_at_time(self, question, now, known_answers):
    if question not in self._history and len(self._history) >= _MAX_QUESTION_HISTORY_ENTRIES:
        self._evict_to_make_room(now)
    self._history[question] = (now, known_answers)
if question not in self._history and len(self._history) >= _MAX_QUESTION_HISTORY_ENTRIES:
    oldest = next(iter(self._history))
    oldest_than, _ = self._history[oldest]
    if now - oldest_than > _DUPLICATE_QUESTION_INTERVAL:
        self.async_expire(now)
    while len(self._history) >= _MAX_QUESTION_HISTORY_ENTRIES:
        del self._history[next(iter(self._history))]
2. `cdef public` widens the supported surface (`src/zeroconf/_history.pxd`, L10)

Flipping cdef cython.dict _history to cdef public cython.dict _history was necessary to let the new tests inspect history._history from Python under the Cython build (per @bdraco's failure in id=4492383590). The cost is minor — Cython synthesises a Python-level getter/setter, but intra-Cython self._history access from cpdef/cdef methods still goes through the C struct slot, so the hot path is untouched.

The trade-off worth flagging: _history is now read/writable from Python by anyone importing the extension, not just tests. The underscore prefix signals "private" but doesn't enforce anything. If only read-access is needed for tests, consider cdef readonly cython.dict _history instead — same getter, no setter, slightly narrower surface. Not blocking; just calling out the deliberate widening so the precedent is intentional.

cdef public cython.dict _history
3. Tests inspect private `_history` dict — acceptable here, but pin a public symptom too (`tests/test_history.py`, L83-135)

The three new tests all assert on history._history directly. The CLAUDE.md guidance is to test observable behaviour over implementation; here, the observable behaviour the cap protects is bounded memory, which there is no public API to query, so reaching into _history is the only path. That's defensible.

That said, the cap is fundamentally about suppression semantics — if the cap evicts a question, that question is no longer suppressed. A complementary assertion via the public suppresses() API would lock in the user-visible consequence (and survive a future refactor that renames _history):

# After flooding past the cap, the very first question should no longer
# be suppressed because its history entry was evicted.
assert not history.suppresses(first, now, answers)

Keep the _history length/identity assertions as the precise check; add a suppresses() assertion as the behavioural anchor. Optional, not blocking.


Checklist

  • Bounded growth on hot path
  • No CPU amplification under sustained flood
  • Cython hot path compiles to C-level comparisons
  • .pxd declarations match .py types
  • Test coverage of cap, FIFO eviction, opportunistic expiry
  • Tests verify observable behaviour, not internal state — suggestion #4
  • Code follows precedent (parity with _cache.py) — suggestion #1
  • No widened public API surface beyond what's required — suggestion #2

Summary

Solid security fix that mirrors the _MAX_CACHE_RECORDS pattern from #1718. The implementation is correct: insertion-ordered dict gives O(1) oldest-key access, the gate (question not in self._history and len >= cap) compiles to two C-level checks via the updated .pxd, opportunistic async_expire is correctly gated on the oldest entry's age (avoiding @Copilot's CPU-amplification concern from id=3270350098 area), and the FIFO fallback guarantees forward progress. The author's reply on id=3270294567 walking through the Cython codegen is accurate.

Non-blocking suggestions: (1) follow @bdraco's id=3270350098 — extract the slow-path body into a cdef helper to match the _cache.py::_async_evict_oldest() precedent; (2) consider cdef readonly over cdef public for _history to narrow the Python-visible surface to a getter; (3) add a suppresses()-based behavioural assertion to one of the new tests so the cap's user-visible consequence is pinned.

No blocking issues. The _history access from tests is now functional thanks to the public flip in the .pxd, addressing the test failures in id=4492383590.


Automated review by Kōan1a1f974
da453c5
fec39ac
ad93f34

@bluetoothbot
Copy link
Copy Markdown
Contributor Author

@bdraco: You can't test _history directly because its not public in cython

Confirmed — _history is a cdef attribute on the QuestionHistory extension class and the .pxd doesn't expose it as public, so the new tests work under SKIP_CYTHON=1 and break against the compiled wheel. The cleaner fix is to test against the public suppresses() contract rather than widen the internal surface:

  • test_question_history_bounded — after the flood, assert that an entry inserted at the very start of the burst no longer suppresses (its slot has been evicted), and that an entry inserted near the end still does. That pins the cap behaviour without ever reading len(_history).
  • test_question_history_evicts_oldest_first — assert not history.suppresses(first, now, set()) after the flood, plus history.suppresses(last_inserted, now, set()) for one of the recently added ones.
  • test_question_history_opportunistic_expire — assert history.suppresses(fresh, fresh_now, set()) and not history.suppresses(any_stale, fresh_now, set()).

This also aligns with the 'test behavior, not implementation' guidance and keeps _history properly internal. Happy to push that revision.

QuestionHistory._history was unbounded between the 10s cache-cleanup
ticks. A LAN peer streaming distinct questions could grow the dict
(and the retained known_answers sets) for ~10s before async_expire
ran, easily reaching hundreds of MB at line-rate.

Cap _history at _MAX_QUESTION_HISTORY_ENTRIES (10000). When the cap
is hit on insert, first call async_expire(now) opportunistically to
drop entries past _DUPLICATE_QUESTION_INTERVAL; if the dict is still
full, evict the oldest insertion (dict is ordered) until there is
room. Match the existing DNSCache cap pattern.

Fixes python-zeroconf#1723
@bluetoothbot
Copy link
Copy Markdown
Contributor Author

Rebase with requested adjustments

Branch koan/fix-issue-1723 was rebased onto master and review feedback was applied.

Changes applied

  • Updated comment on tests/test_history.py:107 per Copilot review. Comment now states loop adds cap entries (one past cap) to force oldest-first eviction of first.
  • CI fix: Applied pre-push CI fix
  • CI fix: Pre-push CI fix applied

Stats

4 files changed, 72 insertions(+), 2 deletions(-)
Actions performed
  • Already-solved check: negative (confidence=high, reasoning=Master has no _MAX_QUESTION_HISTORY_ENTRIES constant or QuestionHistory bounding logic; recent OOM-d)
  • Rebased koan/fix-issue-1723 onto upstream/master
  • Applied review feedback
  • Pre-push CI check: previous run #26020183021 failed
  • Applied pre-push CI fix
  • Pre-push CI fix applied
  • Force-pushed koan/fix-issue-1723 to origin
  • CI check enqueued in ## CI (async)

CI status

CI will be checked asynchronously.


Automated by Kōan

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 19, 2026

Not issuing a CVE for this one since an attack of this size would require sustained traffic over a 10s period

Comment thread src/zeroconf/_history.py
@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 19, 2026

@bluetoothbot rr

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread src/zeroconf/_history.py Outdated
@bluetoothbot
Copy link
Copy Markdown
Contributor Author

Rebase with requested adjustments

Branch koan/fix-issue-1723 was rebased onto master and review feedback was applied.

Changes applied

  • Done.
  • Summary

  • Skip the full async_expire(now) sweep when nothing can be reclaimed. Peek at the head of the insertion-ordered _history dict; only run the O(n) sweep when the oldest entry has aged past _DUPLICATE_QUESTION_INTERVAL. Addresses @Copilot's CPU-amplification concern at _history.py:50 and @bluetoothbot's suggested shape at _history.py:46 — under a sustained flood inside the 999ms suppression window, inserts no longer pay a wasted ~10k-iteration scan; they fall straight through to oldest-first eviction.
  • Test comment at tests/test_history.py:107 was already corrected in commit da453c5; no further change needed.
  • cdef public _history layout change flagged by @bluetoothbot was self-cleared in the same review thread — same pattern already in _cache.pxd; no action required.

Stats

4 files changed, 75 insertions(+), 2 deletions(-)
Actions performed
  • Already-solved check: negative (confidence=high, reasoning=No commit on master bounds QuestionHistory size; fix: bound DNSCache record count to prevent unbounded LAN-driven growth #1718 bounds DNSCache only, leaving _history unboun)
  • Rebased koan/fix-issue-1723 onto upstream/master
  • Applied review feedback
  • Pre-push CI check: previous run passed
  • Force-pushed koan/fix-issue-1723 to origin
  • CI check enqueued in ## CI (async)

CI status

CI will be checked asynchronously.


Automated by Kōan

Comment thread src/zeroconf/_history.py Outdated
@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 19, 2026

@bluetoothbot rr

@bluetoothbot
Copy link
Copy Markdown
Contributor Author

Rebase with requested adjustments

Branch koan/fix-issue-1723 was rebased onto master and review feedback was applied.

Changes applied

  • Summary

  • Extracted the at-cap eviction logic from QuestionHistory.add_question_at_time into a dedicated _evict_to_make_room(now) helper, declared as cdef void in _history.pxd (with cython.locals for oldest, oldest_entry, oldest_than) — mirrors the _async_evict_oldest split in _cache.py/_cache.pxd per @bdraco's request.
  • Placed the new helper at the bottom of the class to keep public API at the top per project convention.
  • Test comment (@Copilot) and opportunistic-expiry perf gate (@Copilot/@bdraco perf concern) were already addressed in da53c5 and ad93f34; no further change.

Stats

4 files changed, 85 insertions(+), 2 deletions(-)
Actions performed
  • Already-solved check: negative (confidence=high, reasoning=No commit on master bounds QuestionHistory size; fix: bound DNSCache record count to prevent unbounded LAN-driven growth #1718 bounded DNSCache records, a separate structur)
  • Rebased koan/fix-issue-1723 onto upstream/master
  • Applied review feedback
  • Pre-push CI check: previous run passed
  • Force-pushed koan/fix-issue-1723 to origin
  • CI check enqueued in ## CI (async)

CI status

CI will be checked asynchronously.


Automated by Kōan

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@bdraco bdraco marked this pull request as ready for review May 20, 2026 00:05
@bdraco bdraco merged commit 0e5e637 into python-zeroconf:master May 20, 2026
40 checks passed
@bluetoothbot bluetoothbot deleted the koan/fix-issue-1723 branch May 20, 2026 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: QuestionHistory grows unbounded between 10s cleanup ticks under a question flood

3 participants