ci(fuzzer/postgres): build postgres-fuzzer from go-fuzzers branch for fix validation#4205
ci(fuzzer/postgres): build postgres-fuzzer from go-fuzzers branch for fix validation#4205ayush3160 wants to merge 5 commits into
Conversation
The postgres-fuzzer matrix on the fuzzer_linux workflow has been failing on `record_latest_replay_build` and `record_build_replay_latest` for several PRs (notably #4203). Local reproduction showed the root cause lives in the fuzzer itself — not in keploy — and the fix sits on keploy/go-fuzzers#fix/postgres-fuzzer-determinism (deterministic table pick + fixed TIMESTAMP/DATE base + panic-safe Add/Done so a pgx panic no longer strands the WaitGroup for 17 minutes until --api-timeout fires). Temporarily swap the postgres-fuzzer download to a clone+build from that go-fuzzers branch so CI can validate the fix end-to-end before the go-fuzzers PR merges and re-uploads `releases/postgres-fuzzer/ latest/postgres-fuzzer-linux-amd64.tar.gz` to S3. Once that release runs, this step should be reverted back to the S3 download in a follow-up commit; the rest of the workflow (AWS setup, run step, artifact upload) is left untouched so the revert is one-step. The clone uses the existing PRO_ACCESS_TOKEN secret which is already plumbed through `prepare_and_run.yml` and used by the mongo fuzzer script the same way. Signed-off-by: Ayush Sharma <kshitij3160@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the Linux fuzzer CI workflow to build the postgres-fuzzer binary from the keploy/go-fuzzers repo (branch fix/postgres-fuzzer-determinism) instead of downloading the “latest” tarball from S3, enabling end-to-end CI validation of fuzzer-side determinism/panic-safety fixes before promoting a new released binary.
Changes:
- Replace the Postgres fuzzer S3 download/extract step with a
git clone+go build ./postgresbuild-from-source step. - Wire
PRO_ACCESS_TOKENandFUZZER_BRANCHinto the build step and tighten shell safety (set -euo pipefail). - Keep downstream Postgres fuzzer test execution and artifact upload unchanged.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - name: Build Postgres Fuzzer from go-fuzzers branch | ||
| id: postgres_fuzzer | ||
| env: | ||
| PRO_ACCESS_TOKEN: ${{ secrets.PRO_ACCESS_TOKEN }} | ||
| FUZZER_BRANCH: fix/postgres-fuzzer-determinism | ||
| run: | | ||
| set -e | ||
| KEY="releases/postgres-fuzzer/latest/postgres-fuzzer-linux-amd64.tar.gz" | ||
| aws s3 cp "s3://${{ vars.AWS_S3_BUCKET }}/${KEY}" . | ||
| tar -xzf postgres-fuzzer-linux-amd64.tar.gz | ||
| set -euo pipefail | ||
| if [[ -z "${PRO_ACCESS_TOKEN}" ]]; then | ||
| echo "::error::PRO_ACCESS_TOKEN secret is required to clone keploy/go-fuzzers" | ||
| exit 1 |
| PRO_ACCESS_TOKEN: ${{ secrets.PRO_ACCESS_TOKEN }} | ||
| FUZZER_BRANCH: fix/postgres-fuzzer-determinism | ||
| run: | | ||
| set -e | ||
| KEY="releases/postgres-fuzzer/latest/postgres-fuzzer-linux-amd64.tar.gz" | ||
| aws s3 cp "s3://${{ vars.AWS_S3_BUCKET }}/${KEY}" . | ||
| tar -xzf postgres-fuzzer-linux-amd64.tar.gz | ||
| set -euo pipefail | ||
| if [[ -z "${PRO_ACCESS_TOKEN}" ]]; then | ||
| echo "::error::PRO_ACCESS_TOKEN secret is required to clone keploy/go-fuzzers" | ||
| exit 1 | ||
| fi | ||
| git clone --depth 1 --branch "${FUZZER_BRANCH}" \ | ||
| "https://${PRO_ACCESS_TOKEN}@github.com/keploy/go-fuzzers.git" \ | ||
| /tmp/go-fuzzers | ||
| cd /tmp/go-fuzzers | ||
| go build -trimpath -ldflags "-s -w" \ | ||
| -o "$GITHUB_WORKSPACE/postgres-fuzzer" ./postgres |
| git clone --depth 1 --branch "${FUZZER_BRANCH}" \ | ||
| "https://${PRO_ACCESS_TOKEN}@github.com/keploy/go-fuzzers.git" \ | ||
| /tmp/go-fuzzers |
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
Empty commit to spawn a fresh workflow run so we get an independent postgres-fuzzer sample. Will squash before merge. Signed-off-by: Ayush Sharma <kshitij3160@gmail.com>
Empty commit for the final independent postgres-fuzzer sample. Will squash before merge. Signed-off-by: Ayush Sharma <kshitij3160@gmail.com>
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
Last empty commit for the flake-check series. Will squash before merge. Signed-off-by: Ayush Sharma <kshitij3160@gmail.com>
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
Describe the changes that are made
releases/postgres-fuzzer/latest/postgres-fuzzer-linux-amd64.tar.gz) to an inlinegit clone --depth 1 --branch fix/postgres-fuzzer-determinism keploy/go-fuzzers+go build ./postgresstep in.github/workflows/fuzzer_linux.yml.PRO_ACCESS_TOKENsecret (already plumbed viaprepare_and_run.yml:713and consumed by the mongo fuzzer script the same way) — no new secrets required.Run Postgres Fuzzer Teststep, artifact upload) are untouched so reverting back to the S3 download is a single-step change once the go-fuzzers PR is merged and the release pipeline re-uploads the binary.Why this change is wanted
The postgres-fuzzer matrix on
fuzzer_linux.ymlhas been failing for cross-version configs on recent PRs (most notable on #4203):record_latest_replay_buildandrecord_build_replay_latestconsistently take ~18 min and fail withPost "http://localhost:8080/fuzz": context deadline exceededfromkeploy test's--api-timeout=1000. Local reproduction traced the root cause to the postgres fuzzer itself, not to keploy:pickRandomTableiteratess.tableswithfor k := range— Go randomizes map iteration, so even with seed=42 record and replay pick different tables oncelen(s.tables) >= 2.generateValueForTypereturnstime.Now().Add(...)for TIMESTAMP / DATE columns. Wall-clock time changes between record and replay, so bind values never match — thebind values diverged from every recorded invocationflavour of error.activeQueries.Add(1) / Done()is not paired bydefer. When pgx panics insideRows.Next(runtime.goPanicIndex(0x5, 0x5)when a Keploy mock's row shape doesn't match the live query),Done()never runs, the WaitGroup stays at +1, and the deferredcleanupSessions.Wait()blocks until--api-timeout=1000fires ~16.94 min later.The fix sits on
keploy/go-fuzzers#fix/postgres-fuzzer-determinism. Local validation against the samerecord_latest_replay_buildconfig that has been failing went from a 16.94 min hang to a 20.23 s pass (test-set-0: PASSED, 2/2 testcases).This PR is the minimal-blast-radius way to verify that fix on real CI runners (same eBPF capture, same matrix, same
--api-timeout) before promoting the binary through the release pipeline.Links & References
Closes: NA
🔗 Related PRs
🐞 Related Issues
📄 Related Documents
What type of PR is this? (check all applicable)
Added e2e test pipeline?
Added comments for hard-to-understand areas?
Added to documentation?
Are there any sample code or steps to test the changes?
The PR's own CI run is the test. Compare these jobs on this PR vs the last
mainrun / PR #4203:If any of the three still fails, the residual signal is purely in the parser/integrations layer (no longer in the fuzzer's determinism / panic-safety).
Self Review done?
Any relevant screenshots, recordings or logs?
Local run with the patched fuzzer against the same
record_latest_replay_buildscript CI uses:For comparison, the stock fuzzer on the same script: