test: drop pending multicast responses before TOCTOU assertion#1701
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1701 +/- ##
=======================================
Coverage 99.76% 99.76%
=======================================
Files 33 33
Lines 3410 3410
Branches 464 464
=======================================
Hits 3402 3402
Misses 5 5
Partials 3 3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9b372ed to
f9167cf
Compare
`test_service_info_async_request` patches `_is_complete` to False
to keep `async_request` looping. With `quick_timing`, the earlier
`async_get_service_info` queries leave multicast responses pending
in `out_queue` / `out_delay_queue` that snapshot the records as
they were *before* `async_update_service` swapped the registry
to `new_info`. When those stale answers flush after
`_clear_cache` they re-cache
`xxxyyy._test1-srvc-type._tcp.local. -> ash-1.local. -> 10.0.1.2`
and point `aiosinfo.server` at `ash-1.local.`. The patched loop
then keeps asking for an A record nobody answers, so the final
state stays on `10.0.1.2`:
AssertionError: assert [b'\\n\\x00\\x01\\x02'] == [b'\\n\\x00\\x01\\x03']
Drop the pending entries from both queues before clearing the
cache. The already-scheduled `loop.call_at` for each queue fires
on the empty deque and is a no-op, so this is deterministic and
adds no wall-clock delay.
f9167cf to
aaf120e
Compare
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
test_service_info_async_requestflakes on multiple CI runners (e.g. macOS 3.12 skip_cython, ubuntu 3.13 use_cython) with:Root cause
After tracing the captured packet log, the failure isn't a timeout / convergence problem — a stale queued multicast response ends up overwriting the correct one:
async_get_service_info/gathercalls trigger queries that the local zeroconf answers viaout_queue(orout_delay_queue). The queue snapshots the records (SRV →ash-1.local., A →10.0.1.2) and aggregates the send by up to ~620ms / ~1320ms.async_update_service(new_info)swaps the registry entry toash-2.local./10.0.1.3and broadcasts the new records.quick_timing, the broadcast finishes within ~30ms, so the stale queued response is still pending when the test reaches the assertion block._clear_cache(...)andasync_request(zc, 500)with_is_completepatched toFalse.ash-1.local. → 10.0.1.2and pointingaiosinfo.serveratash-1.local..ash-1.local., which nothing in the registry answers, so the final state stays on10.0.1.2and the assertion fails.Evidence from the failing ubuntu run (
MulticastOutgoingQueueflushing stale records right after the ash-2 broadcasts):…followed by repeated
questions=[a[…,ash-1.local.], quada[…,ash-1.local.]]queries from the AsyncServiceInfo loop.Fix
Drop the pending entries from both
out_queueandout_delay_queuebefore clearing the cache. The already-scheduledloop.call_atfor each queue fires on the empty deque and is a no-op (thewhile len(self.queue)…loop inasync_readydoesn't iterate, andif answers:is False), so this is deterministic and adds no wall-clock delay.A code-side fix for the underlying behaviour (invalidating queued responses when the registry changes) is left as a separate concern.
Test plan
SKIP_CYTHON=1 poetry run pytest --timeout=60 -q tests/test_asyncio.py::test_service_info_async_request— 5 consecutive runs pass (~4.15s each).poetry run pytest --timeout=60 -q tests/test_asyncio.py::test_service_info_async_request(with Cython extensions) — 3 consecutive runs pass.poetry run pre-commit run --files tests/test_asyncio.py— clean.