test: shave register/shutdown/TC waits on loopback by bluetoothbot · Pull Request #1711 · python-zeroconf/python-zeroconf

bluetoothbot · 2026-05-18T00:03:04Z

What

Speed up four slow tests by extracting one production constant (_REGISTER_RANDOM_INTERVAL, the RFC 6762 §8.1 random pre-probe wait in async_check_service) and patching it through quick_timing on loopback, plus three localised test-only patches.

Part of #1707.

Why

async_check_service waits a random 150-250ms before probing — RFC 6762 §8.1 thundering-herd avoidance for real networks, pure overhead on 127.0.0.1. Tests that use quick_timing already get probe/announce/goodbye intervals shrunk, but the inline random.randint on this path is opaque to patching. Three of the slowest tests in #1707 without an in-flight PR (test_async_service_registration_name_conflict, test_name_conflicts, and everything that registers a service) pay this for nothing.

test_shutdown_loop and test_tc_bit_defers_last_response_missing carry similar dead waits — a 1s outer timeout in shutdown_loop against a never-completing pending task, and an 800ms cleanup poll against a 400-500ms RFC 6762 §7.2 TC-bit defer timer.

How

Extract _REGISTER_RANDOM_INTERVAL = (150, 250) as a module-level constant in _core.py and patch it in quick_timing alongside the other loopback shaves. No production change — default still matches the RFC value.
test_shutdown_loop: patch _TASK_AWAIT_TIMEOUT to 0.1s and drop .result(1) → .result(0.1). Both were just upper bounds on a never-completing pending task.
test_tc_bit_defers_last_response_missing: patch _TC_DELAY_RANDOM_INTERVAL to (10, 20)ms and tighten the cleanup poll loop.
test_integration: wait for the browser's first startup query to fire (cache empty) before registering. The original test relied on the 150-250ms register-random-wait being longer than the browser's 20-120ms first-query delay; with the new fast register that race flipped and the first query started picking up the known PTR via §7.1 known-answer suppression.

Speedups (CPython 3.12, loopback)

test	before	after
`test_shutdown_loop`	1.16s	0.26s
`test_tc_bit_defers_last_response_missing`	0.79s	0.32s
`test_async_service_registration_name_conflict`	0.72s	0.10s
`test_name_conflicts`	0.50s	0.10s

Plus a smaller transparent shave on every other quick_timing-using test that registers a service.

Test plan

Targeted tests pass three times in a row
Full tests/ suite passes (338 passed, 2 skipped, ~47s wall)
Pre-commit hooks pass (ruff, mypy, flake8, codespell, pyupgrade)

Quality Report

Changes: 6 files changed, 31 insertions(+), 12 deletions(-)

Code scan: clean

Tests: passed (4 PASSED)

Branch hygiene: clean

Generated by Kōan post-mission quality pipeline

codecov · 2026-05-18T00:04:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.76%. Comparing base (4bae30a) to head (98c3785).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1711   +/-   ##
=======================================
  Coverage   99.76%   99.76%           
=======================================
  Files          33       33           
  Lines        3426     3427    +1     
  Branches      471      471           
=======================================
+ Hits         3418     3419    +1     
  Misses          5        5           
  Partials        3        3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codspeed-hq · 2026-05-18T00:04:54Z

Merging this PR will not alter performance

✅ 6 untouched benchmarks

_{Comparing bluetoothbot:koan/speed-up-register-random-wait (98c3785) with master (4bae30a)}

bdraco · 2026-05-18T00:14:03Z

all the windows TC tests fail with this pr

    zeroconf:outgoing.py:471 now offsets = questions=1, answers=302, authorities=0, additionals=0
================================== FAILURES ===================================
__________________ test_tc_bit_defers_last_response_missing ___________________

    @patch.object(_listener, "_TC_DELAY_RANDOM_INTERVAL", (10, 20))
    def test_tc_bit_defers_last_response_missing():
        zc = Zeroconf(interfaces=["127.0.0.1"])
        _wait_for_start(zc)
        type_ = "_knowndefer._tcp.local."
        name = "knownname"
        name2 = "knownname2"
        name3 = "knownname3"
    
        registration_name = f"{name}.{type_}"
        registration2_name = f"{name2}.{type_}"
        registration3_name = f"{name3}.{type_}"
    
        desc = {"path": "/~paulsm/"}
        server_name = "ash-2.local."
        server_name2 = "ash-3.local."
        server_name3 = "ash-4.local."
    
        info = r.ServiceInfo(
            type_,
            registration_name,
            80,
            0,
            0,
            desc,
            server_name,
            addresses=[socket.inet_aton("10.0.1.2")],
        )
        info2 = r.ServiceInfo(
            type_,
            registration2_name,
            80,
            0,
            0,
            desc,
            server_name2,
            addresses=[socket.inet_aton("10.0.1.2")],
        )
        info3 = r.ServiceInfo(
            type_,
            registration3_name,
            80,
            0,
            0,
            desc,
            server_name3,
            addresses=[socket.inet_aton("10.0.1.2")],
        )
        zc.registry.async_add(info)
        zc.registry.async_add(info2)
        zc.registry.async_add(info3)
    
        protocol = zc.engine.protocols[0]
        now = r.current_time_millis()
        _clear_cache(zc)
        source_ip = "203.0.113.12"
    
        generated = r.DNSOutgoing(const._FLAGS_QR_QUERY)
        question = r.DNSQuestion(type_, const._TYPE_PTR, const._CLASS_IN)
        generated.add_question(question)
        for _ in range(300):
            # Add so many answers we end up with another packet
            generated.add_answer_at_time(info.dns_pointer(), now)
        generated.add_answer_at_time(info2.dns_pointer(), now)
        generated.add_answer_at_time(info3.dns_pointer(), now)
        packets = generated.packets()
        assert len(packets) == 4
        expected_deferred = []
    
        next_packet = r.DNSIncoming(packets.pop(0))
        expected_deferred.append(next_packet)
        threadsafe_query(zc, protocol, next_packet, source_ip, const._MDNS_PORT, Mock(), ())
        assert protocol._deferred[source_ip] == expected_deferred
        timer1 = protocol._timers[source_ip]
    
        next_packet = r.DNSIncoming(packets.pop(0))
        expected_deferred.append(next_packet)
        threadsafe_query(zc, protocol, next_packet, source_ip, const._MDNS_PORT, Mock(), ())
>       assert protocol._deferred[source_ip] == expected_deferred
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E       KeyError: '203.0.113.12'

try harder

bdraco · 2026-05-18T00:14:07Z

@bluetoothbot rr

bluetoothbot · 2026-05-18T00:28:39Z

PR Review — test: shave register/shutdown/TC waits on loopback

The _REGISTER_RANDOM_INTERVAL extraction is clean and the quick_timing fixture change is the right shape. The test_integration race fix (waiting for the first browser query before registering) is well-reasoned and the explanatory comment correctly cites §7.1 known-answer suppression. The test_shutdown_loop patch is a straightforward test-only tightening.

The blocker is test_tc_bit_defers_last_response_missing: it's failing on Windows per @bdraco. The combination of a 10-20ms patched defer interval and a halved cleanup poll budget (800ms → 400ms) doesn't survive Windows' ~15.6ms scheduler tick. Either restore the original cleanup budget or widen the patched random interval (e.g., (20, 50)ms) so the test's timing assumptions hold across the full CI matrix. Once that's resolved this is good to merge.

🔴 Blocking

1. Cleanup poll timeout halved — too tight for Windows (`tests/test_core.py`, L734-736)

The cleanup poll budget was reduced from 8×100ms=800ms to 20×20ms=400ms — a 50% cut. Combined with the patched _TC_DELAY_RANDOM_INTERVAL=(10, 20), this is failing on Windows (per @bdraco's report).

Two Windows-specific problems compound here:

Coarse timer resolution. Windows' default scheduler tick is ~15.6ms. time.sleep(0.02) typically sleeps ~31ms (next tick), so 20 iterations actually take ~620ms — but the asyncio scheduler driving the TC defer timer has the same coarseness, and a 10-20ms call_later may not fire until 15-30ms+. Under load this can easily push the cleanup past the new 400ms ceiling.
The defer timer is asyncio-driven, not time.sleep-driven. The poll loop runs on the test thread while _TC_DELAY_RANDOM_INTERVAL is consumed by the asyncio event loop's call_later. On Windows the ProactorEventLoop (or SelectorEventLoop) timer slot is bound to the system tick, so squeezing the random window to (10, 20)ms doesn't proportionally reduce when the timer fires — it just floors at the tick boundary.

Suggested fix: either keep the original 800ms cleanup budget (use range(40) × sleep(0.02) if you want finer-grained early-exit), or widen the patch to something like (20, 50)ms which is still 10x faster than the production (400, 500)ms but well clear of the Windows tick.

    for _ in range(20):
        time.sleep(0.02)
        if source_ip not in protocol._timers and source_ip not in protocol._deferred:
            break

🟡 Important

1. Patched interval is below practical Windows asyncio timer resolution (`tests/test_core.py`, L634)

_TC_DELAY_RANDOM_INTERVAL=(10, 20) is shorter than the Windows asyncio scheduler tick (~15.6ms). On Linux this is fine; on Windows the actual fire time will be quantised to the next tick boundary, so the variance is much larger than the (10, 20) range suggests — making the cleanup poll's worst-case timing harder to bound.

A value like (20, 50) or (30, 60) would still cut the production wait by an order of magnitude while staying clear of the Windows tick floor, and would make the test's timing assumptions hold consistently across CI matrices.

@patch.object(_listener, "_TC_DELAY_RANDOM_INTERVAL", (10, 20))

Checklist

Cross-platform timing safety — critical ＃1
Production change matches RFC default
Race fix in test_integration correctly addresses §7.1 KAS
Test docstrings/comments don't pad with rationale
No hand-edits to generated .c files
No Co-Authored-By trailers

Summary

The _REGISTER_RANDOM_INTERVAL extraction is clean and the quick_timing fixture change is the right shape. The test_integration race fix (waiting for the first browser query before registering) is well-reasoned and the explanatory comment correctly cites §7.1 known-answer suppression. The test_shutdown_loop patch is a straightforward test-only tightening.

The blocker is test_tc_bit_defers_last_response_missing: it's failing on Windows per @bdraco. The combination of a 10-20ms patched defer interval and a halved cleanup poll budget (800ms → 400ms) doesn't survive Windows' ~15.6ms scheduler tick. Either restore the original cleanup budget or widen the patched random interval (e.g., (20, 50)ms) so the test's timing assumptions hold across the full CI matrix. Once that's resolved this is good to merge.

Automated review by Kōanca16a07

bluetoothbot · 2026-05-18T00:28:40Z

@bdraco: all the windows TC tests fail with this pr

The Windows failure is almost certainly the combination of two compounding factors in test_tc_bit_defers_last_response_missing:

_TC_DELAY_RANDOM_INTERVAL=(10, 20) is below the Windows scheduler tick (~15.6ms). The asyncio call_later driving the TC defer timer floors at the next tick, so the effective fire time is 15-30ms+ rather than the 10-20ms the patch suggests.
The cleanup poll budget was halved — range(8) × sleep(0.1) = 800ms became range(20) × sleep(0.02) = 400ms. Under Windows' coarser time.sleep granularity the wall clock is closer to 620ms, but the asyncio side is racing the same tick and can easily overshoot.

Two options to fix:

Keep the 800ms cleanup budget (range(40) × sleep(0.02) if you want the finer-grained early exit), or
Widen the patched random interval to something like (20, 50) or (30, 60) ms — still a 10x speedup over the production (400, 500) ms, but well clear of the Windows tick floor so the timing is deterministic across the CI matrix.

I'd lean toward the second — it preserves the speedup intent of the PR without making the cleanup poll do double duty as a slack budget for scheduler jitter.

`async_check_service` paid an inline `random.randint(150, 250)` ms wait on every `register_service` — RFC 6762 §8.1 thundering-herd avoidance for real networks, pure overhead on 127.0.0.1. Extracted as a module constant `_REGISTER_RANDOM_INTERVAL` and patched in the `quick_timing` fixture to (1, 5)ms alongside the existing _CHECK_TIME / _REGISTER_TIME / _UNREGISTER_TIME shaves. Also: - `test_shutdown_loop`: drop `.result(1)` → `.result(0.1)` and patch `_TASK_AWAIT_TIMEOUT` to 0.1s. The 1s waits were `shutdown_loop`'s outer timeout on a never-completing `asyncio.sleep(5)` — the test only needs the loop to be stoppable while a task is pending. - `test_tc_bit_defers_last_response_missing`: patch `_TC_DELAY_RANDOM_INTERVAL` (400-500ms, RFC 6762 §7.2 TC-bit deferral) to (10, 20)ms and tighten the cleanup poll loop. - `test_integration`: wait for the browser's first startup query to fire (with empty cache) before registering. Previously the test relied on the register's 150-250ms random wait being longer than the browser's 20-120ms first-query delay; with the new fast register that race flipped and the first query started seeing the known PTR via §7.1 known-answer suppression. Speedups on loopback (CPython 3.12): | test | before | after | | ----------------------------------------------------- | ------ | ----- | | test_shutdown_loop | 1.16s | 0.26s | | test_tc_bit_defers_last_response_missing | 0.79s | 0.32s | | test_async_service_registration_name_conflict | 0.72s | 0.10s | | test_name_conflicts | 0.50s | 0.10s | Plus a smaller transparent shave on every other `quick_timing` register-based test. Full suite: 338 passed, 2 skipped. No production behavior change — `_REGISTER_RANDOM_INTERVAL` still defaults to the RFC 6762 §8.1 value on real networks.

…ot fire the timer mid-assertion

bluetoothbot · 2026-05-18T00:33:07Z

Rebase with requested adjustments

Branch koan/speed-up-register-random-wait was rebased onto master and review feedback was applied.

Changes applied

CI fix: Applied pre-push CI fix
CI fix: Pre-push CI fix applied

Stats

5 files changed, 23 insertions(+), 9 deletions(-)

Actions performed

Already-solved check: negative (confidence=high, reasoning=No commit on master extracts _REGISTER_RANDOM_INTERVAL or patches it via quick_timing; recent sp)
Rebased koan/speed-up-register-random-wait onto upstream/master
Pre-push CI check: previous run #26006589202 failed
Applied pre-push CI fix
Pre-push CI fix applied
Force-pushed koan/speed-up-register-random-wait to origin
CI check enqueued in ## CI (async)

CI status

CI will be checked asynchronously.

Automated by Kōan

bluetoothbot added 2 commits May 18, 2026 00:29

test: widen TC-defer interval patch so Windows scheduling jitter cann…

6b84b72

…ot fire the timer mid-assertion

bluetoothbot force-pushed the koan/speed-up-register-random-wait branch from ca16a07 to 6b84b72 Compare May 18, 2026 00:33

bluetoothbot had a problem deploying to release May 18, 2026 00:36 — with GitHub Actions Error

fix: resolve CI failures on python-zeroconf#1711 (attempt 1)

98c3785

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: shave register/shutdown/TC waits on loopback#1711

test: shave register/shutdown/TC waits on loopback#1711
bluetoothbot wants to merge 3 commits into
python-zeroconf:masterfrom
bluetoothbot:koan/speed-up-register-random-wait

bluetoothbot commented May 18, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 18, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented May 18, 2026 •

edited

Loading

Uh oh!

bdraco commented May 18, 2026

Uh oh!

bdraco commented May 18, 2026

Uh oh!

bluetoothbot commented May 18, 2026

Uh oh!

bluetoothbot commented May 18, 2026

Uh oh!

bluetoothbot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bluetoothbot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Speedups (CPython 3.12, loopback)

Test plan

Quality Report

Uh oh!

codecov Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

bdraco commented May 18, 2026

Uh oh!

bdraco commented May 18, 2026

Uh oh!

bluetoothbot commented May 18, 2026

PR Review — test: shave register/shutdown/TC waits on loopback

🔴 Blocking

🟡 Important

Checklist

Summary

Uh oh!

bluetoothbot commented May 18, 2026

Uh oh!

bluetoothbot commented May 18, 2026

Rebase with requested adjustments

Changes applied

Stats

CI status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bluetoothbot commented May 18, 2026 •

edited

Loading

codecov Bot commented May 18, 2026 •

edited

Loading

codspeed-hq Bot commented May 18, 2026 •

edited

Loading