Skip to content

[build] make release pipeline rerun-safe#17626

Merged
titusfortner merged 2 commits into
trunkfrom
rerun_build
Jun 5, 2026
Merged

[build] make release pipeline rerun-safe#17626
titusfortner merged 2 commits into
trunkfrom
rerun_build

Conversation

@titusfortner
Copy link
Copy Markdown
Member

@titusfortner titusfortner commented Jun 4, 2026

The primary failure pattern for releases recently is when one language fails to publish for some reason, the rest of the flow can't continue and everything needs to be processed manually.

This will allow us to click "Rerun failed jobs" and it shouldn't break anything.

💥 What does this PR do?

  • Rescue the error for release already published to rubygems and npm
  • Set Python release action to use skip-existing
  • Always create the tag at the beginning of the workflow instead of relying on the github actions release to do it
  • Fail (for now) on Java when rerun is attempted because there isn't a simple way to do it

🔧 Implementation Notes

  • .NET already runs with --skip-duplicate

🤖 AI assistance

  • No substantial AI assistance used
  • AI assisted (complete below)
    • Tool(s):
    • What was generated:
    • I reviewed all AI output and can explain the change

💡 Additional Considerations

  • We need to find a way to do this for Java

🔄 Types of changes

  • Cleanup (formatting, renaming)
  • Bug fix (backwards compatible)
  • New feature (non-breaking change which adds functionality and tests!)
  • Breaking change (fix or feature that would cause existing functionality to change)

@selenium-ci selenium-ci added the B-build Includes scripting, bazel and CI integrations label Jun 4, 2026
@qodo-code-review
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Make release pipeline rerun-safe with duplicate handling

✨ Enhancement 🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Add error handling for already-published packages across languages
  - Node.js: rescue npm publish errors for previously published versions
  - Ruby: extract gem publishing logic with duplicate version detection
  - Python: enable skip-existing flag in PyPI publish action
• Create release tags unconditionally at workflow start
  - Check if tag exists before creation to support workflow reruns
  - Remove language-specific tag creation condition
• Prevent Java release reruns with explicit error message
  - Fail fast on Java rerun attempts due to staging repo complexity
• Update workflow job dependencies to reflect tag creation changes
Diagram
flowchart LR
  A["Release Workflow"] --> B["Create Tag<br/>Check if exists"]
  A --> C["Publish Languages"]
  C --> D["Node.js<br/>Rescue npm errors"]
  C --> E["Ruby<br/>Rescue gem errors"]
  C --> F["Python<br/>Skip existing"]
  C --> G["Java<br/>Fail on rerun"]
  B --> H["Docs & Verification"]
  D --> H
  E --> H
  F --> H

Loading

Grey Divider

File Changes

1. rake_tasks/node.rake Error handling +9/-1

Add error handling for npm duplicate versions

• Wrap Bazel.execute call in begin-rescue block to catch npm publish errors
• Skip release if error message matches "cannot publish over the previously published"
• Only rescue errors in non-dry-run mode to preserve dry-run error reporting

rake_tasks/node.rake


2. rake_tasks/ruby.rake Refactoring +11/-9

Extract gem publishing with duplicate detection

• Extract gem publishing logic into new publish_gem helper function
• Function rescues "Repushing of gem versions" errors and logs skip message
• Replace inline error handling with calls to publish_gem for both nightly and release tasks
• Simplifies code by centralizing duplicate version detection logic

rake_tasks/ruby.rake


3. .github/workflows/release.yml ⚙️ Configuration changes +23/-9

Make workflow rerun-safe with tag and language checks

• Rename create-language-tag job to create-tag and remove language condition
• Add tag existence check before creation to support workflow reruns
• Add Java rerun prevention with explicit error message directing to staging repo
• Add skip-existing: true parameter to PyPI publish action
• Update all job dependencies from create-language-tag to create-tag
• Update Slack notification to reference new job name

.github/workflows/release.yml


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented Jun 4, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (1) 📎 Requirement gaps (0) 🎨 UX issues (0)

Grey Divider


Action required

1. Java rerun blocks other releases ✓ Resolved 🐞 Bug ≡ Correctness
Description
The publish matrix exits with an error on any rerun for the Java entry before it checks whether Java
is actually being released, so rerunning a ruby/dotnet/javascript-only release will still fail due
to Java. This prevents reruns for non-Java patch releases (and any rerun attempt >1) even when Java
would otherwise be skipped.
Code

.github/workflows/release.yml[R145-148]

+        if [ "${{ matrix.language == 'java' && github.run_attempt != '1' }}" = "true" ]; then
+          echo "::error::Java release is not yet rerun-safe — check/drop the staging repo at https://central.sonatype.com/publishing/deployments and publish manually"
+          exit 1
+        fi
Evidence
publish always includes java in its matrix, and the new Java rerun guard runs before checking
whether the selected release language matches the matrix entry; for a non-Java patch release,
parse-tag outputs a specific language (e.g., ruby), so Java should be skipped but instead fails
on rerun attempts > 1.

.github/workflows/release.yml[129-155]
.github/workflows/parse-release-tag.yml[39-73]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The publish matrix currently fails on rerun attempts (`github.run_attempt != '1'`) for the Java matrix entry even when the workflow is releasing a different language (e.g., `selenium-4.28.1-ruby`). This happens because the Java rerun guard runs before the “is this language selected?” check.

## Issue Context
- `publish` runs a matrix over `[java, ruby, dotnet, javascript]`.
- For patch releases, `parse-tag` sets `outputs.language` to the tag suffix (e.g., `ruby`).
- On reruns, the Java matrix entry should *skip* when `outputs.language != 'java'`, but it currently fails early.

## Fix
Move or gate the Java rerun guard so it only triggers when Java is actually being released:

Option A (recommended): nest the Java guard inside the selected-language branch:
```bash
if [ "${{ needs.parse-tag.outputs.language == 'all' || needs.parse-tag.outputs.language == matrix.language }}" = "true" ]; then
 if [ "${{ matrix.language == 'java' && github.run_attempt != '1' }}" = "true" ]; then
   echo "::error::Java release is not yet rerun-safe — ..."
   exit 1
 fi
 ./go ${{ matrix.language }}:release
else
 echo skipping
fi
```

Option B: add the selected-language predicate directly into the Java guard condition.

## Fix Focus Areas
- .github/workflows/release.yml[144-153]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. TAG unvalidated before tagging 📘 Rule violation ≡ Correctness
Description
The new create-tag step assumes TAG and SHA are non-empty and valid before calling gh api to
create a git ref. If either value is missing/empty, the workflow can create or attempt to create an
incorrect ref (or fail in a non-obvious way), violating the requirement to validate CI preconditions
and fail safely.
Code

.github/workflows/release.yml[R78-88]

          GH_TOKEN: ${{ secrets.SELENIUM_CI_TOKEN }}
          TAG: ${{ needs.parse-tag.outputs.tag }}
          SHA: ${{ github.event.pull_request.merge_commit_sha || github.sha }}
        run: |
+          if gh api "/repos/${{ github.repository }}/git/ref/tags/${TAG}" >/dev/null 2>&1; then
+            echo "Tag ${TAG} already exists — skipping creation."
+            exit 0
+          fi
          gh api -X POST /repos/${{ github.repository }}/git/refs \
            -f ref="refs/tags/${TAG}" \
            -f sha="${SHA}"
Evidence
Rule 10 requires validating preconditions/inputs in CI automation and failing safely on errors. The
added tag-creation script uses TAG/SHA directly in gh api calls without checking that they are
set/non-empty before proceeding.

.github/workflows/release.yml[78-88]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The workflow creates a git tag using `TAG` and `SHA` without validating they are present and well-formed.

## Issue Context
Per CI/automation safety expectations, required inputs should be validated explicitly so reruns and unusual triggers fail deterministically with actionable errors.

## Fix Focus Areas
- .github/workflows/release.yml[78-88]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread .github/workflows/release.yml Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Makes the release workflow more rerun-safe so “Rerun failed jobs” can be used without forcing manual cleanup across languages.

Changes:

  • Ruby/Node release tasks tolerate “already published” failures and skip instead of failing the workflow.
  • PyPI publish step uses skip-existing to avoid failing on reruns.
  • Workflow creates the release tag explicitly and adds a Java rerun guard (fails on rerun attempts).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
rake_tasks/ruby.rake Adds a helper to skip publishing when a gem version is already published.
rake_tasks/node.rake Skips npm publishing when the version is already published (non-dry-run).
.github/workflows/release.yml Adds explicit tag creation, PyPI skip-existing, and rerun behavior adjustments.

Comment thread .github/workflows/release.yml
Comment thread .github/workflows/release.yml
Comment thread .github/workflows/release.yml Outdated
@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented Jun 4, 2026

Code review by qodo was updated up to the latest commit 92a9dbf

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented Jun 4, 2026

CI Feedback 🧐

(Feedback updated until commit 92a9dbf)

A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

Action: CI Success

Failed stage: Verify required jobs succeeded [❌]

Failed test name: ""

Failure summary:

The workflow failed because a guard step in the CI Success job intentionally exited with code 1
after detecting that one or more required jobs failed.
- The step prints One or more required jobs
failed and then runs exit 1 (lines 27–35).
- This indicates the actual failure happened in an
earlier required job; this log does not include which job/test failed.

Relevant error logs:
1:  ##[group]Runner Image Provisioner
2:  Hosted Compute Agent
...

14:  Image: ubuntu-24.04
15:  Version: 20260525.161.1
16:  Included Software: https://github.com/actions/runner-images/blob/ubuntu24/20260525.161/images/ubuntu/Ubuntu2404-Readme.md
17:  Image Release: https://github.com/actions/runner-images/releases/tag/ubuntu24%2F20260525.161
18:  ##[endgroup]
19:  ##[group]GITHUB_TOKEN Permissions
20:  Contents: read
21:  Metadata: read
22:  ##[endgroup]
23:  Secret source: Actions
24:  Prepare workflow directory
25:  Prepare all required actions
26:  Complete job name: CI Success
27:  ##[group]Run if true; then
28:  �[36;1mif true; then�[0m
29:  �[36;1m  echo "One or more required jobs failed"�[0m
30:  �[36;1m  exit 1�[0m
31:  �[36;1mfi�[0m
32:  shell: /usr/bin/bash -e {0}
33:  ##[endgroup]
34:  One or more required jobs failed
35:  ##[error]Process completed with exit code 1.
36:  Cleaning up orphan processes

@titusfortner titusfortner merged commit 94f0032 into trunk Jun 5, 2026
36 of 38 checks passed
@titusfortner titusfortner deleted the rerun_build branch June 5, 2026 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-build Includes scripting, bazel and CI integrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants