Skip to content

test: fix flaky test_run_coro_with_timeout#1683

Merged
bdraco merged 2 commits into
python-zeroconf:masterfrom
bluetoothbot:koan/fix-flaky-run-coro-with-timeout
May 16, 2026
Merged

test: fix flaky test_run_coro_with_timeout#1683
bdraco merged 2 commits into
python-zeroconf:masterfrom
bluetoothbot:koan/fix-flaky-run-coro-with-timeout

Conversation

@bluetoothbot

@bluetoothbot bluetoothbot commented May 16, 2026

Copy link
Copy Markdown
Contributor

What

Make tests/utils/test_asyncio.py::test_run_coro_with_timeout reliable on slow / loaded CI runners.

Why

Closes #1663. The test patched _LOADED_SYSTEM_TIMEOUT to 0.0 and passed timeout=0.1 ms to run_coro_with_timeout, so concurrent.futures.Future.result() waited just 0.0001 seconds. On a loaded system the executor thread's .result() could time out before the event loop got a chance to start the scheduled coroutine, leaving the task nonlocal still None and tripping assert task is not None. Nixpkgs hit this intermittently on the 3.14 build.

How

  • Bump the outer timeout passed to run_coro_with_timeout to 50 ms — still well under the inner sleep(1), so EventLoopBlocked is still guaranteed, but the loop now has plenty of headroom to schedule the coroutine.
  • Lengthen the inner asyncio.sleep from 0.2 s to 1 s for the same headroom reason.
  • Add a threading.Event (task_created) the coroutine sets right after creating the inner task. After pytest.raises(EventLoopBlocked) catches, wait up to 1 s on it before asserting on task. This is the actual race fix; the timeout bump is belt-and-braces.
  • Drop the redundant inner assert task is not None.

Testing

  • poetry run pytest tests/utils/test_asyncio.py::test_run_coro_with_timeout --count=50 --no-cov → 50/50 pass.
  • poetry run pytest tests/utils/test_asyncio.py --no-cov → all 7 tests pass.
  • poetry run ruff check tests/utils/test_asyncio.py and ruff format --check clean.

Quality Report

Changes: 1 file changed, 8 insertions(+), 3 deletions(-)

Code scan: clean

Tests: failed (timeout (120s))

Branch hygiene: clean

Generated by Kōan post-mission quality pipeline

@codspeed-hq

codspeed-hq Bot commented May 16, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 6 untouched benchmarks


Comparing bluetoothbot:koan/fix-flaky-run-coro-with-timeout (fc99d27) with master (1f380b1)

Open in CodSpeed

@codecov

codecov Bot commented May 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.76%. Comparing base (1f380b1) to head (fc99d27).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1683   +/-   ##
=======================================
  Coverage   99.76%   99.76%           
=======================================
  Files          33       33           
  Lines        3401     3401           
  Branches      461      461           
=======================================
  Hits         3393     3393           
  Misses          5        5           
  Partials        3        3           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to make test_run_coro_with_timeout less flaky on slow CI runners by giving the event loop more time to schedule the coroutine before asserting on the created task.

Changes:

  • Adds a synchronization event to detect when the inner task is created.
  • Increases the coroutine timeout and inner sleep duration to reduce scheduling sensitivity.
  • Removes a redundant assertion inside the coroutine.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/utils/test_asyncio.py Outdated
@bdraco

bdraco commented May 16, 2026

Copy link
Copy Markdown
Member

@bluetoothbot rr

@bluetoothbot

Copy link
Copy Markdown
Contributor Author

PR Review — test: fix flaky test_run_coro_with_timeout

The timeout bump (0.1 → 50 ms outer, 0.2s → 1s inner) is the right move and is what's actually making the test reliable — 50ms of headroom is plenty for the loop to schedule the inner coroutine before EventLoopBlocked fires. The threading.Event 'belt-and-braces' addition, however, doesn't do what the comment says: because this is an @pytest.mark.asyncio test and the event must be set from the same loop we're blocking with wait(), the synchronization only works in the case where it isn't needed. If the scheduling race actually happens, the new mechanism makes the failure mode worse (1s hang + assert) rather than fixing it. Switch to asyncio.Event with await asyncio.wait_for(...), or wrap the existing wait in asyncio.to_thread. Not a blocker for merge if you're comfortable that the 50ms buffer alone suffices, but the comment and Event should be removed in that case to avoid implying a guarantee that isn't there. Copilot's earlier comment on this line is correct.


🟡 Important

1. `threading.Event.wait()` blocks the loop that needs to set the event (`tests/utils/test_asyncio.py`, L143)

This test is @pytest.mark.asyncio, so loop = asyncio.get_event_loop() is the same loop the test coroutine runs on. _saved_sleep_task is scheduled onto that loop via run_coroutine_threadsafe from the worker thread, and task_created.set() is called from inside it — i.e. from the loop thread itself.

threading.Event.wait(1.0) is a synchronous block on the loop thread. If _saved_sleep_task has not yet been scheduled by the time we hit this line (the exact race the PR description is trying to defend against), the loop cannot dispatch it during the 1s wait, so the event is never set and the assertion fails after a 1-second hang. The mechanism doesn't help in the scenario it claims to handle.

What is actually keeping this test green in the 50/50 rerun is the timeout bump from 0.1 ms → 50 ms: the executor thread now waits ~50ms before raising EventLoopBlocked, and on any reasonable system the loop schedules _saved_sleep_task well within that window. The explicit synchronization is decorative.

If you want the synchronization to be load-bearing, use asyncio.Event (created and .set() on the same loop) and await asyncio.wait_for(task_created.wait(), 1.0), which yields to the loop and lets it dispatch the pending _saved_sleep_task. Alternatively, wrap the existing threading.Event in await asyncio.to_thread(task_created.wait, 1.0) so the blocking wait runs off the loop thread.

Otherwise, this is just a timeout bump dressed up as a race fix — fine, but the comment block and task_created machinery overstate what is actually happening.

# Suggested fix using asyncio.Event:
task_created = asyncio.Event()
# ...inside _saved_sleep_task:
task_created.set()
# ...after pytest.raises block:
await asyncio.wait_for(task_created.wait(), 1.0)

Checklist

  • No bare except / silent error swallowing
  • Test isolation (no shared state)
  • Edge case coverage for the documented race — warning #1
  • No event-loop-blocking calls on the loop thread — warning #1
  • Conventional Commits subject (test: prefix)
  • No Co-Authored-By agent trailers
  • Comments document non-obvious why

Summary

The timeout bump (0.1 → 50 ms outer, 0.2s → 1s inner) is the right move and is what's actually making the test reliable — 50ms of headroom is plenty for the loop to schedule the inner coroutine before EventLoopBlocked fires. The threading.Event 'belt-and-braces' addition, however, doesn't do what the comment says: because this is an @pytest.mark.asyncio test and the event must be set from the same loop we're blocking with wait(), the synchronization only works in the case where it isn't needed. If the scheduling race actually happens, the new mechanism makes the failure mode worse (1s hang + assert) rather than fixing it. Switch to asyncio.Event with await asyncio.wait_for(...), or wrap the existing wait in asyncio.to_thread. Not a blocker for merge if you're comfortable that the 50ms buffer alone suffices, but the comment and Event should be removed in that case to avoid implying a guarantee that isn't there. Copilot's earlier comment on this line is correct.


Automated review by Kōan79311dd

@bluetoothbot

Copy link
Copy Markdown
Contributor Author

Rebase with requested adjustments

Branch koan/fix-flaky-run-coro-with-timeout was rebased onto master and review feedback was applied.

Changes applied

  • Summary:
  • Replaced threading.Event with asyncio.Event for task_created per Copilot review feedback. threading.Event.wait() runs synchronously on the test's event loop — the same loop that must execute _saved_sleep_task() and call task_created.set(). If the inner coroutine had not started before the outer .result() timed out, the blocking wait would have held the loop for the full second, preventing the event from being set and reproducing the original race.
  • Swapped task_created.wait(1.0) for await asyncio.wait_for(task_created.wait(), 1.0), which yields to the loop so _saved_sleep_task can run, create the inner task, and set the event.

Stats

1 file changed, 9 insertions(+), 3 deletions(-)
Actions performed
  • Already-solved check: negative (confidence=high, reasoning=Master still has timeout=0.1 and sleep(0.2) without task_created Event; flaky race not yet fixed.)
  • Rebased koan/fix-flaky-run-coro-with-timeout onto upstream/master
  • Applied review feedback
  • Pre-push CI check: previous run passed
  • Force-pushed koan/fix-flaky-run-coro-with-timeout to origin
  • CI check enqueued in ## CI (async)

CI status

CI will be checked asynchronously.


Automated by Kōan

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@bdraco bdraco marked this pull request as ready for review May 16, 2026 17:12
@bdraco bdraco merged commit 277f80d into python-zeroconf:master May 16, 2026
47 checks passed
@bluetoothbot bluetoothbot deleted the koan/fix-flaky-run-coro-with-timeout branch May 16, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test test_run_coro_with_timeout

3 participants