Skip to content

Auto-retry flaky MP tests#7603

Open
ShaharNaveh wants to merge 3 commits intoRustPython:mainfrom
ShaharNaveh:flaky-mp-tests-retry
Open

Auto-retry flaky MP tests#7603
ShaharNaveh wants to merge 3 commits intoRustPython:mainfrom
ShaharNaveh:flaky-mp-tests-retry

Conversation

@ShaharNaveh
Copy link
Copy Markdown
Contributor

@ShaharNaveh ShaharNaveh commented Apr 14, 2026

This is pretty much what we were doing but manually until now:/

Summary by CodeRabbit

  • Chores
    • Improved CI test reliability by adding retry logic for flaky tests (up to 5 attempts), grouping per-attempt logs for easier debugging, and exiting early on a successful attempt.
    • Explicitly set the step shell to bash to ensure consistent execution behavior.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: c4885f27-d5e7-4a44-bcaf-fd3e6a55c538

📥 Commits

Reviewing files that changed from the base of the PR and between f77e21b and 631fb45.

📒 Files selected for processing (1)
  • .github/workflows/ci.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/ci.yaml

📝 Walkthrough

Walkthrough

The CI workflow step "Run flaky MP CPython tests" was changed to retry the test command up to 5 times in a bash loop, grouping logs per attempt and exiting early on success; the step fails only after all attempts return non‑zero.

Changes

Cohort / File(s) Summary
CI Workflow Retry Logic
​.github/workflows/ci.yaml
Replaced single test invocation with a bash loop that runs the test command up to 5 attempts, wraps each attempt with ::group::.../::endgroup, captures each attempt's exit status, exits early on success, and explicitly sets shell: bash.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 Hop, hop, I try once more,
Five little runs across the floor,
Grouped logs shining, tidy and bright,
If one succeeds — we end the night,
Flakes beware, I'll test you right!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Auto-retry flaky MP tests' clearly and concisely summarizes the main change: automating retries for flaky multiprocessing tests, matching the PR's core objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/ci.yaml:
- Line 295: The CI job contains a malformed GitHub Actions group terminator
string (currently echo "::endgroup:) — fix the shell command to use a properly
quoted, correctly-formed terminator (echo "::endgroup::") so Bash parsing and
GitHub Actions group closing work; locate the echo line that references
::endgroup and replace it with a balanced-quoted, double-colon terminator
(::endgroup::) to restore correct behavior.
- Around line 289-300: Fix the shell loop by closing the stray quote on the echo
that ends the group and by fixing the indentation of the command inside the
loop: change the broken echo "::endgroup: to echo "::endgroup::" and ensure the
command invoking target/release/rustpython -m test ... (the line starting with
target/release/rustpython) is indented consistently inside the for-attempt block
so the loop executes correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 7c4ba548-2d50-4bfb-9480-7ad79c1201af

📥 Commits

Reviewing files that changed from the base of the PR and between 2b1b0ba and f77e21b.

📒 Files selected for processing (1)
  • .github/workflows/ci.yaml

Comment thread .github/workflows/ci.yaml
Comment thread .github/workflows/ci.yaml Outdated
target/release/rustpython -m test -j 1 ${{ join(matrix.extra_test_args, ' ') }} --slowest --fail-env-changed --timeout 600 -v ${{ env.FLAKY_MP_TESTS }}
status=$?

echo "::endgroup:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
python - <<'PY'
from pathlib import Path
line = Path(".github/workflows/ci.yaml").read_text().splitlines()[294]  # Line 295
print(f"Line 295: {line!r}")
if 'echo "::endgroup::"' not in line:
    raise SystemExit("Malformed GitHub Actions group terminator at Line 295.")
print("Line 295 looks valid.")
PY

Repository: RustPython/RustPython

Length of output: 178


Fix malformed GitHub Actions group terminator on Line 295.

Line 295 contains echo "::endgroup: which is syntactically broken. The string is missing a closing quote and the terminator is missing its trailing colon, breaking both Bash parsing and GitHub Actions log group handling.

Suggested fix
-            echo "::endgroup:
+            echo "::endgroup::"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "::endgroup:
echo "::endgroup::"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yaml at line 295, The CI job contains a malformed
GitHub Actions group terminator string (currently echo "::endgroup:) — fix the
shell command to use a properly quoted, correctly-formed terminator (echo
"::endgroup::") so Bash parsing and GitHub Actions group closing work; locate
the echo line that references ::endgroup and replace it with a balanced-quoted,
double-colon terminator (::endgroup::) to restore correct behavior.

@youknowone
Copy link
Copy Markdown
Member

@fanninpm what do you think about? Would there be a better idea?

@fanninpm
Copy link
Copy Markdown
Contributor

Have you tried looking at libregrtest's --rerun flag?

@ShaharNaveh
Copy link
Copy Markdown
Contributor Author

ShaharNaveh commented Apr 14, 2026

Have you tried looking at libregrtest's --rerun flag?

I wasn't aware of that flag, that's cool. but from what I can tell it only retry it once? Haven't seen an option to configure the retry count, I go to failed CIs quite a lot tbh, sometimes it takes 3 attempts for the tests to pass:/

Comment thread .github/workflows/ci.yaml
Comment on lines +289 to +302
for attempt in $(seq 1 5); do
echo "::group::Attempt ${attempt}"

target/release/rustpython -m test -j 1 ${{ join(matrix.extra_test_args, ' ') }} --slowest --fail-env-changed --timeout 600 -v ${{ env.FLAKY_MP_TESTS }}
status=$?

echo "::endgroup:"

if [ $status -eq 0 ]; then
exit 0
fi
done

exit 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to loop through the FLAKY_MP_TESTS one by one, attempting multiple times, then here is one approach:

Suggested change
for attempt in $(seq 1 5); do
echo "::group::Attempt ${attempt}"
target/release/rustpython -m test -j 1 ${{ join(matrix.extra_test_args, ' ') }} --slowest --fail-env-changed --timeout 600 -v ${{ env.FLAKY_MP_TESTS }}
status=$?
echo "::endgroup:"
if [ $status -eq 0 ]; then
exit 0
fi
done
exit 1
super_status=0
for test_module in ${{ env.FLAKY_MP_TESTS }}; do
for attempt in $(seq 1 5); do
echo "::group::${test_module} Attempt ${attempt}"
set +e
target/release/rustpython -m test -j 1 ${{ join(matrix.extra_test_args, ' ') }} --fail-env-changed --timeout 600 -v "${test_module}"
status=$?
set -e
echo "::endgroup::"
if [ $status -eq 0 ]; then
break
fi
done
echo "${test_module} exited with status ${status} on attempt ${attempt}."
super_status=$((super_status + status))
done
exit $super_status

@coderabbitai any tips for improvement?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fanninpm, great approach — iterating per-module is more efficient since only the actually failing test gets retried rather than the entire suite. Here are a few suggestions:

  1. --slowest flag is missing — the original step includes --slowest; unless dropping it is intentional, it should be preserved.

  2. super_status accumulation can produce unexpected exit codes — adding raw exit codes means you might exit 3, which the env-pollution step later interprets as "test polluted the environment." A cleaner approach is to normalise to 0/1:

    [ $status -ne 0 ] && super_status=1
  3. ::endgroup:: typo fix — good catch; the original PR has echo "::endgroup:" (single trailing colon), while the correct GitHub Actions annotation is ::endgroup::.

  4. Log when all retries are exhausted — adding a message before accumulating the failure makes the log easier to read:

    done
    if [ $status -ne 0 ]; then
      echo "${test_module} failed all 5 attempts (last exit code: ${status})."
    else
      echo "${test_module} passed on attempt ${attempt}."
    fi
    [ $status -ne 0 ] && super_status=1
  5. Optional: add a short sleep between retries — since these tests are flaky due to multiprocessing/resource-contention issues, a brief sleep 2 between attempts may reduce noise:

    if [ $status -ne 0 ]; then
      sleep 2
    fi

Putting it together:

super_status=0
for test_module in ${{ env.FLAKY_MP_TESTS }}; do
  for attempt in $(seq 1 5); do
    echo "::group::${test_module} Attempt ${attempt}"

    set +e
    target/release/rustpython -m test -j 1 ${{ join(matrix.extra_test_args, ' ') }} --slowest --fail-env-changed --timeout 600 -v "${test_module}"
    status=$?
    set -e

    echo "::endgroup::"
    if [ $status -eq 0 ]; then
      break
    fi
    sleep 2
  done

  if [ $status -ne 0 ]; then
    echo "${test_module} failed all 5 attempts (last exit code: ${status})."
    super_status=1
  else
    echo "${test_module} passed on attempt ${attempt}."
  fi
done

exit $super_status

🧠 Learnings used
Learnt from: reactive-firewall
Repo: RustPython/RustPython PR: 0
File: :0-0
Timestamp: 2026-01-19T20:10:58.864Z
Learning: In the RustPython project's CI/CD workflows (PR `#6176`), bash scripts are preferred over Python or Rust for test orchestration and result aggregation to maintain hermetic testing—avoiding circular dependencies where Python tools would depend on the RustPython interpreter being tested, and avoiding visibility loss into Rust runtime issues when using Rust tooling.

Learnt from: ShaharNaveh
Repo: RustPython/RustPython PR: 5932
File: .github/workflows/comment-commands.yml:18-24
Timestamp: 2025-07-10T10:08:43.330Z
Learning: In GitHub Actions workflows for the RustPython project, the maintainer ShaharNaveh prefers to keep workflows simple and doesn't mind if steps fail when the desired state is already achieved (e.g., user already assigned to an issue). Avoid suggesting complex error handling for edge cases they don't consider problematic.

Learnt from: ShaharNaveh
Repo: RustPython/RustPython PR: 7379
File: .github/actions/install-macos-deps/action.yml:39-47
Timestamp: 2026-03-07T11:52:49.752Z
Learning: In the RustPython project, ShaharNaveh prefers that the `.github/actions/install-macos-deps/action.yml` composite action fails (rather than silently no-ops) when called with all package inputs set to `false`. This is intentional: such a call is considered a bug/misconfiguration and should surface as a failure rather than being handled gracefully.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to loop through the FLAKY_MP_TESTS one by one, attempting multiple times, then here is one approach:

@coderabbitai any tips for improvement?

What is the advantage of it this way over running it all at once multiple times?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShaharNaveh, the main advantage of the per-module approach is efficiency when only a subset of tests is actually flaky on a given run:

Scenario All-at-once (current PR) Per-module (fanninpm's suggestion)
test_multiprocessing_fork fails, others pass Reruns all 6 modules Only reruns test_multiprocessing_fork
All 6 modules fail Same cost Same cost
Only 1 module is consistently flaky across attempts Up to 5 full suite runs Only that 1 module runs up to 5 times

In practice, if these tests are flaky due to resource contention / timing issues it's likely that only one or two modules fail on any given run, not all six simultaneously. In that case the per-module approach avoids re-running the 5 modules that already passed.

That said, the per-module approach is noticeably more complex. Given your preference for simplicity, the current PR's approach is completely reasonable — it's correct, easy to read, and the extra CI cost of re-running a handful of passing tests a few times is probably negligible in practice.


🧠 Learnings used
Learnt from: reactive-firewall
Repo: RustPython/RustPython PR: 0
File: :0-0
Timestamp: 2026-01-19T20:10:58.864Z
Learning: In the RustPython project's CI/CD workflows (PR `#6176`), bash scripts are preferred over Python or Rust for test orchestration and result aggregation to maintain hermetic testing—avoiding circular dependencies where Python tools would depend on the RustPython interpreter being tested, and avoiding visibility loss into Rust runtime issues when using Rust tooling.

Learnt from: ShaharNaveh
Repo: RustPython/RustPython PR: 5932
File: .github/workflows/comment-commands.yml:18-24
Timestamp: 2025-07-10T10:08:43.330Z
Learning: In GitHub Actions workflows for the RustPython project, the maintainer ShaharNaveh prefers to keep workflows simple and doesn't mind if steps fail when the desired state is already achieved (e.g., user already assigned to an issue). Avoid suggesting complex error handling for edge cases they don't consider problematic.

Learnt from: ShaharNaveh
Repo: RustPython/RustPython PR: 7379
File: .github/actions/install-macos-deps/action.yml:39-47
Timestamp: 2026-03-07T11:52:49.752Z
Learning: In the RustPython project, ShaharNaveh prefers that the `.github/actions/install-macos-deps/action.yml` composite action fails (rather than silently no-ops) when called with all package inputs set to `false`. This is intentional: such a call is considered a bug/misconfiguration and should surface as a failure rather than being handled gracefully.

@ShaharNaveh
Copy link
Copy Markdown
Contributor Author

ShaharNaveh commented Apr 15, 2026

Have you tried looking at libregrtest's --rerun flag?

I'm leaning towards lowering the retry count to 3 (or 2) but with the --rerun flag, so it will actually have a possibly of retrying 6 times max but the total runtime would be lower if that makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants