Auto-retry flaky MP tests by ShaharNaveh · Pull Request #7603 · RustPython/RustPython

ShaharNaveh · 2026-04-14T08:11:11Z

This is pretty much what we were doing but manually until now:/

Summary by CodeRabbit

Chores
- Improved CI test reliability by adding retry logic for flaky tests (up to five sequential attempts), grouping logs per attempt for clearer diagnostics, and stopping early when a run succeeds to shorten pipeline time.
- Standardized the execution environment by explicitly setting the shell to bash to ensure consistent behavior and timeout handling across runs.

coderabbitai · 2026-04-14T08:11:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 0767bc9c-99d0-4266-b841-aa5733a0eb56

📥 Commits

Reviewing files that changed from the base of the PR and between ebad76f and 5663c74.

📒 Files selected for processing (1)

.github/workflows/ci.yaml

✅ Files skipped from review due to trivial changes (1)

.github/workflows/ci.yaml

📝 Walkthrough

Walkthrough

The CI workflow step "Run flaky MP CPython tests" now runs the test command in a bash loop with up to 5 attempts, groups logs per attempt, captures each attempt's exit status, exits immediately on a successful attempt, and fails only if all attempts return non‑zero.

Changes

Cohort / File(s)	Summary
CI Workflow Retry Logic `.github/workflows/ci.yaml`	Replaced single test invocation with a `bash` loop that retries the test command up to 5 attempts, wraps each attempt with `::group::...`/`::endgroup`, captures each attempt's exit status (temporarily disabling `set -e`), exits early on success (`exit 0`), and fails after all attempts (`exit 1`).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 I hop and try, attempt by attempt,
Five tidy groups where logs ferment,
One brave zero and I curl to rest,
Else I bound on, giving it my best,
CI carrots lined up — what a event!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Auto-retry flaky MP tests' directly and clearly describes the main change: adding automatic retry logic for flaky multiprocessing tests in the CI workflow.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yaml:
- Line 295: The CI job contains a malformed GitHub Actions group terminator
string (currently echo "::endgroup:) — fix the shell command to use a properly
quoted, correctly-formed terminator (echo "::endgroup::") so Bash parsing and
GitHub Actions group closing work; locate the echo line that references
::endgroup and replace it with a balanced-quoted, double-colon terminator
(::endgroup::) to restore correct behavior.
- Around line 289-300: Fix the shell loop by closing the stray quote on the echo
that ends the group and by fixing the indentation of the command inside the
loop: change the broken echo "::endgroup: to echo "::endgroup::" and ensure the
command invoking target/release/rustpython -m test ... (the line starting with
target/release/rustpython) is indented consistently inside the for-attempt block
so the loop executes correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 7c4ba548-2d50-4bfb-9480-7ad79c1201af

📥 Commits

Reviewing files that changed from the base of the PR and between 2b1b0ba and f77e21b.

📒 Files selected for processing (1)

.github/workflows/ci.yaml

youknowone · 2026-04-14T11:59:08Z

@fanninpm what do you think about? Would there be a better idea?

fanninpm · 2026-04-14T14:54:48Z

Have you tried looking at libregrtest's --rerun flag?

ShaharNaveh · 2026-04-14T15:07:47Z

Have you tried looking at libregrtest's --rerun flag?

I wasn't aware of that flag, that's cool. but from what I can tell it only retry it once? Haven't seen an option to configure the retry count, I go to failed CIs quite a lot tbh, sometimes it takes 3 attempts for the tests to pass:/

fanninpm · 2026-04-15T21:20:32Z

+          for attempt in $(seq 1 5); do
+            echo "::group::Attempt ${attempt}"
+
+            target/release/rustpython -m test -j 1 ${{ join(matrix.extra_test_args, ' ') }} --slowest --fail-env-changed --timeout 600 -v ${{ env.FLAKY_MP_TESTS }}
+            status=$?
+
+            echo "::endgroup:"
+
+            if [ $status -eq 0 ]; then
+              exit 0
+            fi
+          done
+
+          exit 1


If you want to loop through the FLAKY_MP_TESTS one by one, attempting multiple times, then here is one approach:

Suggested change

for attempt in $(seq 1 5); do

echo "::group::Attempt ${attempt}"

target/release/rustpython -m test -j 1 ${{ join(matrix.extra_test_args, ' ') }} --slowest --fail-env-changed --timeout 600 -v ${{ env.FLAKY_MP_TESTS }}

status=$?

echo "::endgroup:"

if [ $status -eq 0 ]; then

exit 0

fi

done

exit 1

super_status=0

for test_module in ${{ env.FLAKY_MP_TESTS }}; do

for attempt in $(seq 1 5); do

echo "::group::${test_module} Attempt ${attempt}"

set +e

target/release/rustpython -m test -j 1 ${{ join(matrix.extra_test_args, ' ') }} --fail-env-changed --timeout 600 -v "${test_module}"

status=$?

set -e

echo "::endgroup::"

if [ $status -eq 0 ]; then

break

fi

done

echo "${test_module} exited with status ${status} on attempt ${attempt}."

super_status=$((super_status + status))

done

exit $super_status

@coderabbitai any tips for improvement?

@fanninpm, great approach — iterating per-module is more efficient since only the actually failing test gets retried rather than the entire suite. Here are a few suggestions:

--slowest flag is missing — the original step includes --slowest; unless dropping it is intentional, it should be preserved.

super_status accumulation can produce unexpected exit codes — adding raw exit codes means you might exit 3, which the env-pollution step later interprets as "test polluted the environment." A cleaner approach is to normalise to 0/1:

[ $status -ne 0 ] && super_status=1

::endgroup:: typo fix — good catch; the original PR has echo "::endgroup:" (single trailing colon), while the correct GitHub Actions annotation is ::endgroup::.

Log when all retries are exhausted — adding a message before accumulating the failure makes the log easier to read:

done if [ $status -ne 0 ]; then echo "${test_module} failed all 5 attempts (last exit code: ${status})." else echo "${test_module} passed on attempt ${attempt}." fi [ $status -ne 0 ] && super_status=1

Optional: add a short sleep between retries — since these tests are flaky due to multiprocessing/resource-contention issues, a brief sleep 2 between attempts may reduce noise:

if [ $status -ne 0 ]; then sleep 2 fi

Putting it together:

super_status=0 for test_module in ${{ env.FLAKY_MP_TESTS }}; do for attempt in $(seq 1 5); do echo "::group::${test_module} Attempt ${attempt}" set +e target/release/rustpython -m test -j 1 ${{ join(matrix.extra_test_args, ' ') }} --slowest --fail-env-changed --timeout 600 -v "${test_module}" status=$? set -e echo "::endgroup::" if [ $status -eq 0 ]; then break fi sleep 2 done if [ $status -ne 0 ]; then echo "${test_module} failed all 5 attempts (last exit code: ${status})." super_status=1 else echo "${test_module} passed on attempt ${attempt}." fi done exit $super_status

🧠 Learnings used

Learnt from: reactive-firewall Repo: RustPython/RustPython PR: 0 File: :0-0 Timestamp: 2026-01-19T20:10:58.864Z Learning: In the RustPython project's CI/CD workflows (PR `#6176`), bash scripts are preferred over Python or Rust for test orchestration and result aggregation to maintain hermetic testing—avoiding circular dependencies where Python tools would depend on the RustPython interpreter being tested, and avoiding visibility loss into Rust runtime issues when using Rust tooling.

Learnt from: ShaharNaveh Repo: RustPython/RustPython PR: 5932 File: .github/workflows/comment-commands.yml:18-24 Timestamp: 2025-07-10T10:08:43.330Z Learning: In GitHub Actions workflows for the RustPython project, the maintainer ShaharNaveh prefers to keep workflows simple and doesn't mind if steps fail when the desired state is already achieved (e.g., user already assigned to an issue). Avoid suggesting complex error handling for edge cases they don't consider problematic.

Learnt from: ShaharNaveh Repo: RustPython/RustPython PR: 7379 File: .github/actions/install-macos-deps/action.yml:39-47 Timestamp: 2026-03-07T11:52:49.752Z Learning: In the RustPython project, ShaharNaveh prefers that the `.github/actions/install-macos-deps/action.yml` composite action fails (rather than silently no-ops) when called with all package inputs set to `false`. This is intentional: such a call is considered a bug/misconfiguration and should surface as a failure rather than being handled gracefully.

If you want to loop through the FLAKY_MP_TESTS one by one, attempting multiple times, then here is one approach:

@coderabbitai any tips for improvement?

What is the advantage of it this way over running it all at once multiple times?

@ShaharNaveh, the main advantage of the per-module approach is efficiency when only a subset of tests is actually flaky on a given run:

Scenario All-at-once (current PR) Per-module (fanninpm's suggestion)

test_multiprocessing_fork fails, others pass Reruns all 6 modules Only reruns test_multiprocessing_fork

All 6 modules fail Same cost Same cost

Only 1 module is consistently flaky across attempts Up to 5 full suite runs Only that 1 module runs up to 5 times

In practice, if these tests are flaky due to resource contention / timing issues it's likely that only one or two modules fail on any given run, not all six simultaneously. In that case the per-module approach avoids re-running the 5 modules that already passed.

That said, the per-module approach is noticeably more complex. Given your preference for simplicity, the current PR's approach is completely reasonable — it's correct, easy to read, and the extra CI cost of re-running a handful of passing tests a few times is probably negligible in practice.

🧠 Learnings used

Learnt from: reactive-firewall Repo: RustPython/RustPython PR: 0 File: :0-0 Timestamp: 2026-01-19T20:10:58.864Z Learning: In the RustPython project's CI/CD workflows (PR `#6176`), bash scripts are preferred over Python or Rust for test orchestration and result aggregation to maintain hermetic testing—avoiding circular dependencies where Python tools would depend on the RustPython interpreter being tested, and avoiding visibility loss into Rust runtime issues when using Rust tooling.

Learnt from: ShaharNaveh Repo: RustPython/RustPython PR: 5932 File: .github/workflows/comment-commands.yml:18-24 Timestamp: 2025-07-10T10:08:43.330Z Learning: In GitHub Actions workflows for the RustPython project, the maintainer ShaharNaveh prefers to keep workflows simple and doesn't mind if steps fail when the desired state is already achieved (e.g., user already assigned to an issue). Avoid suggesting complex error handling for edge cases they don't consider problematic.

Learnt from: ShaharNaveh Repo: RustPython/RustPython PR: 7379 File: .github/actions/install-macos-deps/action.yml:39-47 Timestamp: 2026-03-07T11:52:49.752Z Learning: In the RustPython project, ShaharNaveh prefers that the `.github/actions/install-macos-deps/action.yml` composite action fails (rather than silently no-ops) when called with all package inputs set to `false`. This is intentional: such a call is considered a bug/misconfiguration and should surface as a failure rather than being handled gracefully.

ShaharNaveh · 2026-04-15T21:53:31Z

Have you tried looking at libregrtest's --rerun flag?

I'm leaning towards lowering the retry count to 3 (or 2) but with the --rerun flag, so it will actually have a possibly of retrying 6 times max but the total runtime would be lower if that makes sense

Co-authored-by: fanninpm <27117322+fanninpm@users.noreply.github.com>

ShaharNaveh · 2026-04-17T16:47:43Z

ty @fanninpm :)

Auto-retry flaky MP tests

f77e21b

coderabbitai bot reviewed Apr 14, 2026

View reviewed changes

Comment thread .github/workflows/ci.yaml

Comment thread .github/workflows/ci.yaml Outdated

ShaharNaveh added 2 commits April 14, 2026 12:11

Fix missing quote

2e78ffd

Fix indent

631fb45

fanninpm reviewed Apr 15, 2026

View reviewed changes

youknowone requested a review from fanninpm April 17, 2026 10:58

Merge remote-tracking branch 'upstream/main' into flaky-mp-tests-retry

ebad76f

fanninpm reviewed Apr 17, 2026

View reviewed changes

Comment thread .github/workflows/ci.yaml

Comment thread .github/workflows/ci.yaml Outdated

ShaharNaveh and others added 2 commits April 17, 2026 18:47

Update .github/workflows/ci.yaml

f3d7972

Co-authored-by: fanninpm <27117322+fanninpm@users.noreply.github.com>

Update .github/workflows/ci.yaml

5663c74

Co-authored-by: fanninpm <27117322+fanninpm@users.noreply.github.com>

Scenario	All-at-once (current PR)	Per-module (fanninpm's suggestion)
`test_multiprocessing_fork` fails, others pass	Reruns all 6 modules	Only reruns `test_multiprocessing_fork`
All 6 modules fail	Same cost	Same cost
Only 1 module is consistently flaky across attempts	Up to 5 full suite runs	Only that 1 module runs up to 5 times

Conversation

ShaharNaveh commented Apr 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

youknowone commented Apr 14, 2026

Uh oh!

fanninpm commented Apr 14, 2026

Uh oh!

ShaharNaveh commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fanninpm Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

ShaharNaveh Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

ShaharNaveh commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ShaharNaveh commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ShaharNaveh commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 14, 2026 •

edited

Loading

ShaharNaveh commented Apr 14, 2026 •

edited

Loading

ShaharNaveh commented Apr 15, 2026 •

edited

Loading