fix: bound QuestionHistory size to prevent LAN-driven OOM#1733
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1733 +/- ##
=======================================
Coverage 99.76% 99.76%
=======================================
Files 33 33
Lines 3444 3455 +11
Branches 475 478 +3
=======================================
+ Hits 3436 3447 +11
Misses 5 5
Partials 3 3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
You can't test _history directly because its not public in cython |
There was a problem hiding this comment.
Pull request overview
This PR mitigates a LAN-triggerable OOM vector by bounding the growth of QuestionHistory._history between the engine’s periodic cache cleanup ticks, preventing unbounded retention of distinct questions (and associated known-answer sets) during sustained query floods.
Changes:
- Introduces
_MAX_QUESTION_HISTORY_ENTRIESinconst.pyto capQuestionHistorysize. - Adds opportunistic expiry on insert and oldest-first eviction in
QuestionHistory.add_question_at_timewhen the cap is reached. - Updates the Cython
.pxdand adds focused unit tests validating cap behavior, FIFO eviction, and opportunistic expiry.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/zeroconf/const.py |
Adds _MAX_QUESTION_HISTORY_ENTRIES constant to configure the hard cap. |
src/zeroconf/_history.py |
Implements opportunistic expiry + oldest-first eviction when inserting past the cap. |
src/zeroconf/_history.pxd |
Declares the new constant for the Cython-compiled hot path. |
tests/test_history.py |
Adds tests covering bounded growth, FIFO eviction, and opportunistic expiry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
PR Review — fix: bound QuestionHistory size to prevent LAN-driven OOMSolid security fix that mirrors the Non-blocking suggestions: (1) follow @bdraco's id=3270350098 — extract the slow-path body into a No blocking issues. The 🟢 Suggestions1. Consider extracting eviction body into a cdef helper (`src/zeroconf/_history.py`, L43-53)@bdraco's suggestion (id=3270350098) to extract the inner body into a separate
Suggested shape (in cdef void _evict_to_make_room(self, double now)In def _evict_to_make_room(self, now: _float) -> None:
oldest = next(iter(self._history))
oldest_than, _ = self._history[oldest]
if now - oldest_than > _DUPLICATE_QUESTION_INTERVAL:
self.async_expire(now)
while len(self._history) >= _MAX_QUESTION_HISTORY_ENTRIES:
del self._history[next(iter(self._history))]
def add_question_at_time(self, question, now, known_answers):
if question not in self._history and len(self._history) >= _MAX_QUESTION_HISTORY_ENTRIES:
self._evict_to_make_room(now)
self._history[question] = (now, known_answers)2. `cdef public` widens the supported surface (`src/zeroconf/_history.pxd`, L10)Flipping The trade-off worth flagging: 3. Tests inspect private `_history` dict — acceptable here, but pin a public symptom too (`tests/test_history.py`, L83-135)The three new tests all assert on That said, the cap is fundamentally about suppression semantics — if the cap evicts a question, that question is no longer suppressed. A complementary assertion via the public # After flooding past the cap, the very first question should no longer
# be suppressed because its history entry was evicted.
assert not history.suppresses(first, now, answers)Keep the Checklist
SummarySolid security fix that mirrors the Non-blocking suggestions: (1) follow @bdraco's id=3270350098 — extract the slow-path body into a No blocking issues. The |
Confirmed —
This also aligns with the 'test behavior, not implementation' guidance and keeps |
QuestionHistory._history was unbounded between the 10s cache-cleanup ticks. A LAN peer streaming distinct questions could grow the dict (and the retained known_answers sets) for ~10s before async_expire ran, easily reaching hundreds of MB at line-rate. Cap _history at _MAX_QUESTION_HISTORY_ENTRIES (10000). When the cap is hit on insert, first call async_expire(now) opportunistically to drop entries past _DUPLICATE_QUESTION_INTERVAL; if the dict is still full, evict the oldest insertion (dict is ordered) until there is room. Match the existing DNSCache cap pattern. Fixes python-zeroconf#1723
Rebase with requested adjustmentsBranch Changes applied
StatsActions performed
CI statusCI will be checked asynchronously. Automated by Kōan |
1998e43 to
fec39ac
Compare
|
Not issuing a CVE for this one since an attack of this size would require sustained traffic over a 10s period |
Rebase with requested adjustmentsBranch Changes applied
StatsActions performed
CI statusCI will be checked asynchronously. Automated by Kōan |
Rebase with requested adjustmentsBranch Changes applied
StatsActions performed
CI statusCI will be checked asynchronously. Automated by Kōan |
Summary
QuestionHistory._historywas unbounded between the periodic 10sasync_expireticks driven by_CACHE_CLEANUP_INTERVAL. A LAN peer streaming distinct questions could grow the dict (and the retainedknown_answerssets) for ~10s before the next cleanup, easily reaching hundreds of MB at line-rate.Cap
_historyat a configurable upper bound, with opportunistic expiry on insert and oldest-first FIFO eviction as a fallback. Mirrors the_MAX_CACHE_RECORDSpattern from #1718.Fixes #1723
Changes
_MAX_QUESTION_HISTORY_ENTRIES = 10000toconst.py.QuestionHistory.add_question_at_time, when the cap would be exceeded by a new entry, first callasync_expire(now)to drop entries past_DUPLICATE_QUESTION_INTERVAL; if still full, evict the oldest insertion (Pythondictpreserves insertion order) until there is room._history.pxdto declare the new constant for the Cython hot path.Test plan
tests/test_history.pycases:test_question_history_bounded— flooding past the cap keeps_historysize ≤ cap.test_question_history_evicts_oldest_first— earliest-inserted entry is the first dropped at cap.test_question_history_opportunistic_expire— stale entries are reclaimed before fresh ones are evicted.SKIP_CYTHON=1 poetry run pytest --timeout=60 tests→ 361 passed, 2 skipped.poetry run cython -3 src/zeroconf/_history.pycompiles cleanly with the updated.pxd.Generated by Kōan /fix
Quality Report
Changes: 4 files changed, 70 insertions(+), 1 deletion(-)
Code scan: clean
Tests: passed (4 PASSED)
Branch hygiene: clean
Generated by Kōan post-mission quality pipeline