Skip to content

feat: add CI pipeline for go-memory-load-mysql, mongo, and grpc#4107

Open
pathakharshit wants to merge 116 commits into
mainfrom
feat/add-go-memory-load-mysql-mongo-grpc-ci
Open

feat: add CI pipeline for go-memory-load-mysql, mongo, and grpc#4107
pathakharshit wants to merge 116 commits into
mainfrom
feat/add-go-memory-load-mysql-mongo-grpc-ci

Conversation

@pathakharshit
Copy link
Copy Markdown
Contributor

@pathakharshit pathakharshit commented Apr 22, 2026

Describe the changes that are made

Added k6 load-test CI pipelines for MySQL, MongoDB, and gRPC sample apps,
mirroring the existing Postgres (go-memory-load) pipeline pattern. Also
enabled full record + replay for the existing Postgres pipeline (replay was
previously disabled).

Each new pipeline:

  • Has a dedicated sample app with deterministic store logic for stable mock capture and replay
  • Runs Keploy in record mode under k6 load, then replays captured mocks and validates correctness
  • Is evaluated on error rate (hard), latency (soft), and memory limit (hard)

Links & References

Closes: #

🔗 Related PRs

  • keploy/samples-go — feat/add-go-memory-load-mysql
  • keploy/samples-go — feat/add-go-memory-load-mongo
  • keploy/samples-go — feat/add-go-memory-load-grpc

🐞 Related Issues

NA

📄 Related Documents

NA

What type of PR is this?

  • 🔁 CI
  • ✅ Test
  • 🍕 Feature

Added e2e test pipeline?

  • 👍 yes

Added comments for hard-to-understand areas?

  • 👍 yes

Added to documentation?

  • 🙅 no documentation needed

Are there any sample code or steps to test the changes?

  • 👍 yes, mentioned below

Steps to test:

  1. Trigger the Golang On Docker workflow
  2. Verify the following matrix jobs pass:
    • go-memory-load-mysql (×3 configs)
    • go-memory-load-mongo (×3 configs)
    • go-memory-load-grpc (×3 configs)
    • go-memory-load (×3 configs) — now with replay enabled

Self Review done?

  • ✅ yes

Any relevant screenshots, recordings or logs?

NA

Copilot AI review requested due to automatic review settings April 22, 2026 20:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CI coverage for additional Go “memory load” sample apps by wiring them into the existing golang_docker workflow and providing per-app docker runner scripts.

Changes:

  • Added new docker-based workflow scripts for go-memory-load-mysql, go-memory-load-mongo, and go-memory-load-grpc.
  • Enabled the replay phase in the existing go-memory-load script.
  • Expanded .github/workflows/golang_docker.yml matrix to run the new apps and switched samples-go checkout ref.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
.github/workflows/test_workflow_scripts/golang/go_memory_load_mysql/golang-docker.sh New CI runner script for MySQL load test app (record + load + replay) with memory monitoring.
.github/workflows/test_workflow_scripts/golang/go_memory_load_mongo/golang-docker.sh New CI runner script for Mongo load test app (record + load + replay) with memory monitoring.
.github/workflows/test_workflow_scripts/golang/go_memory_load_grpc/golang-docker.sh New CI runner script for gRPC load test app (record + load + replay) with memory monitoring.
.github/workflows/test_workflow_scripts/golang/go_memory_load/golang-docker.sh Un-commented replay stage to actually run replay in CI.
.github/workflows/golang_docker.yml Added new matrix entries for the new apps and changed the samples-go ref.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +368 to +372
# Extract the http_req_failed percentage, e.g. "3.26%" from:
# http_req_failed.................: 3.26% ✓ 10 ✗ 296
local fail_pct
fail_pct="$(grep -oP 'http_req_failed[.]*:\s+\K[0-9]+(\.[0-9]+)?' "$k6_log" | head -1 || true)"

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_k6_failure_rate relies on grep -P with \K, which is not available in all environments (and can vary across runners). Prefer a POSIX-ish parser (e.g., awk/sed) to extract the failure rate to avoid CI brittleness.

Copilot uses AI. Check for mistakes.
docker compose build

section "Cleaning previous artifacts"
sudo rm -rf keploy/
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cleanup uses sudo rm -rf keploy/ unconditionally. In environments where sudo isn’t available (or requires a password), the script will fail under set -e. Use the existing run_with_keploy_privileges helper (or a conditional command -v sudo) for cleanup to keep the script consistent and portable.

Suggested change
sudo rm -rf keploy/
run_with_keploy_privileges rm -rf keploy/

Copilot uses AI. Check for mistakes.
Comment on lines +368 to +372
# Extract the http_req_failed percentage, e.g. "3.26%" from:
# http_req_failed.................: 3.26% ✓ 10 ✗ 296
local fail_pct
fail_pct="$(grep -oP 'http_req_failed[.]*:\s+\K[0-9]+(\.[0-9]+)?' "$k6_log" | head -1 || true)"

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_k6_failure_rate relies on grep -P with \K, which is not available in all environments (and can vary across runners). Prefer a POSIX-ish parser (e.g., awk/sed) to extract the failure rate to avoid CI brittleness.

Copilot uses AI. Check for mistakes.
Comment on lines +442 to +450
section "Recording load-test traffic"
run_with_keploy_privileges "$RECORD_BIN" record -c "docker compose up" --container-name "$APP_CONTAINER_NAME" --memory-limit "$RECORD_MEMORY_LIMIT_MB" --enable-sampling --generate-github-actions=false 2>&1 | tee record.txt &
record_pid=$!
echo "Started Keploy record process with PID: $record_pid"

keploy_container="$(wait_for_keploy_container 120)"
echo "Detected Keploy container: $keploy_container"
# apply_keploy_memory_limit "$keploy_container"
start_memory_monitor "$keploy_container" "$record_pid" "record"
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record_pid=$! captures the PID of tee (last process in the background pipeline), not the Keploy record process. As a result, start_memory_monitor will stop monitoring early and kill -TERM "$phase_pid" won’t terminate the recorder when a memory violation/OOM is detected. Capture the actual Keploy record PID (e.g., via pgrep/ps after start) or avoid the pipeline so $! refers to the recorder process, and pass that PID into the monitor.

Copilot uses AI. Check for mistakes.
docker compose build

section "Cleaning previous artifacts"
sudo rm -rf keploy/
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cleanup uses sudo rm -rf keploy/ unconditionally. In environments where sudo isn’t available (or requires a password), the script will fail under set -e. Use the existing run_with_keploy_privileges helper (or a conditional command -v sudo) for cleanup to keep the script consistent and portable.

Suggested change
sudo rm -rf keploy/
run_with_keploy_privileges rm -rf keploy/

Copilot uses AI. Check for mistakes.
Comment on lines +368 to +375
# Extract the grpc_req_failed percentage, e.g. "3.26%" from:
# grpc_req_failed.................: 3.26% ✓ 10 ✗ 296
# Fall back to http_req_failed for compatibility.
local fail_pct
fail_pct="$(grep -oP 'grpc_req_failed[.]*:\s+\K[0-9]+(\.[0-9]+)?' "$k6_log" | head -1 || true)"
if [ -z "$fail_pct" ]; then
fail_pct="$(grep -oP 'http_req_failed[.]*:\s+\K[0-9]+(\.[0-9]+)?' "$k6_log" | head -1 || true)"
fi
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_k6_failure_rate relies on grep -P with \K, which is not available in all environments (and can vary across runners). Prefer a POSIX-ish parser (e.g., awk/sed) to extract the failure rate to avoid CI brittleness.

Copilot uses AI. Check for mistakes.
Comment on lines +439 to +447
section "Recording load-test traffic"
run_with_keploy_privileges "$RECORD_BIN" record -c "docker compose up" --container-name "$APP_CONTAINER_NAME" --memory-limit "$RECORD_MEMORY_LIMIT_MB" --enable-sampling --generate-github-actions=false 2>&1 | tee record.txt &
record_pid=$!
echo "Started Keploy record process with PID: $record_pid"

keploy_container="$(wait_for_keploy_container 120)"
echo "Detected Keploy container: $keploy_container"
# apply_keploy_memory_limit "$keploy_container"
start_memory_monitor "$keploy_container" "$record_pid" "record"
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record_pid=$! captures the PID of tee (last process in the background pipeline), not the Keploy record process. As a result, start_memory_monitor will stop monitoring early and kill -TERM "$phase_pid" won’t terminate the recorder when a memory violation/OOM is detected. Capture the actual Keploy record PID (e.g., via pgrep/ps after start) or avoid the pipeline so $! refers to the recorder process, and pass that PID into the monitor.

Copilot uses AI. Check for mistakes.
docker compose build

section "Cleaning previous artifacts"
sudo rm -rf keploy/
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cleanup uses sudo rm -rf keploy/ unconditionally. In environments where sudo isn’t available (or requires a password), the script will fail under set -e. Use the existing run_with_keploy_privileges helper (or a conditional command -v sudo) for cleanup to keep the script consistent and portable.

Suggested change
sudo rm -rf keploy/
run_with_keploy_privileges rm -rf keploy/

Copilot uses AI. Check for mistakes.
Comment on lines +439 to +447
section "Recording load-test traffic"
run_with_keploy_privileges "$RECORD_BIN" record -c "docker compose up" --container-name "$APP_CONTAINER_NAME" --memory-limit "$RECORD_MEMORY_LIMIT_MB" --enable-sampling --generate-github-actions=false 2>&1 | tee record.txt &
record_pid=$!
echo "Started Keploy record process with PID: $record_pid"

keploy_container="$(wait_for_keploy_container 120)"
echo "Detected Keploy container: $keploy_container"
# apply_keploy_memory_limit "$keploy_container"
start_memory_monitor "$keploy_container" "$record_pid" "record"
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record_pid=$! captures the PID of tee (last process in the background pipeline), not the Keploy record process. As a result, start_memory_monitor will stop monitoring early and kill -TERM "$phase_pid" won’t terminate the recorder when a memory violation/OOM is detected. Capture the actual Keploy record PID (e.g., via pgrep/ps after start) or avoid the pipeline so $! refers to the recorder process, and pass that PID into the monitor.

Copilot uses AI. Check for mistakes.
Comment on lines 63 to 68
- name: Checkout the samples-go repository
uses: actions/checkout@v4
with:
repository: keploy/samples-go
ref: main
ref: feat/all-memory-load-apps
path: samples-go
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow now checks out keploy/samples-go from feat/all-memory-load-apps instead of main. CI will become dependent on a non-default branch that may be rebased/deleted, causing flaky failures. Prefer pinning to a commit SHA/tag, or merging the required samples into main and keeping CI on main.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.85ms 3.67ms 5.34ms 100.00 0.00% ✅ PASS
2 2.67ms 3.39ms 5.12ms 100.02 0.00% ✅ PASS
3 2.51ms 3.27ms 4.71ms 100.03 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.48ms 3.24ms 4.75ms 100.00 0.00% ✅ PASS
2 2.47ms 3.18ms 4.51ms 100.00 0.00% ✅ PASS
3 2.4ms 3.1ms 4.73ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.5ms 3.18ms 4.7ms 100.02 0.00% ✅ PASS
2 2.4ms 3ms 4.28ms 100.03 0.00% ✅ PASS
3 2.37ms 2.97ms 4.2ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.4ms 3.02ms 4.67ms 100.02 0.00% ✅ PASS
2 2.36ms 2.93ms 4.39ms 100.02 0.00% ✅ PASS
3 2.34ms 2.93ms 4.29ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.49ms 3.12ms 4.65ms 100.00 0.00% ✅ PASS
2 2.44ms 3.08ms 4.54ms 100.00 0.00% ✅ PASS
3 2.39ms 3ms 4.28ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.45ms 3.07ms 4.63ms 100.00 0.00% ✅ PASS
2 N/A N/A N/A N/A N/A ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.66ms 3.44ms 4.67ms 100.00 0.00% ✅ PASS
2 2.57ms 3.3ms 4.44ms 100.02 0.00% ✅ PASS
3 2.54ms 3.27ms 4.31ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 1.93ms 2.54ms 3.76ms 100.00 0.00% ✅ PASS
2 1.81ms 2.29ms 3.64ms 100.02 0.00% ✅ PASS
3 1.79ms 2.23ms 3.24ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.68ms 3.49ms 4.77ms 100.00 0.00% ✅ PASS
2 2.56ms 3.33ms 4.58ms 100.00 0.00% ✅ PASS
3 2.48ms 3.18ms 4.68ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.47ms 3.05ms 4.72ms 100.02 0.00% ✅ PASS
2 2.33ms 2.94ms 4.18ms 100.00 0.00% ✅ PASS
3 2.38ms 2.97ms 4.22ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.48ms 3.1ms 4.57ms 100.00 0.00% ✅ PASS
2 2.45ms 3.08ms 4.52ms 100.00 0.00% ✅ PASS
3 2.57ms 3.4ms 5.17ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.48ms 3.19ms 4.66ms 100.03 0.00% ✅ PASS
2 2.39ms 2.98ms 4.39ms 100.02 0.00% ✅ PASS
3 2.36ms 2.94ms 4.06ms 100.03 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.45ms 3.14ms 4.71ms 100.00 0.00% ✅ PASS
2 2.38ms 3.06ms 4.38ms 100.02 0.00% ✅ PASS
3 2.55ms 3.22ms 4.39ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.46ms 3.05ms 4.59ms 100.02 0.00% ✅ PASS
2 2.39ms 2.94ms 4.39ms 100.02 0.00% ✅ PASS
3 2.34ms 2.89ms 4.44ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.4ms 3ms 4.57ms 100.00 0.00% ✅ PASS
2 2.35ms 2.88ms 4.37ms 100.00 0.00% ✅ PASS
3 2.3ms 2.83ms 4.13ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.59ms 3.39ms 4.73ms 100.00 0.00% ✅ PASS
2 2.39ms 2.98ms 4.62ms 100.02 0.00% ✅ PASS
3 2.32ms 2.9ms 3.85ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.4ms 3.06ms 4.41ms 100.00 0.00% ✅ PASS
2 2.32ms 2.9ms 4.13ms 100.02 0.00% ✅ PASS
3 2.32ms 2.92ms 3.94ms 100.03 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.53ms 3.25ms 4.83ms 100.00 0.00% ✅ PASS
2 2.46ms 3.08ms 4.57ms 100.02 0.00% ✅ PASS
3 2.42ms 3.02ms 4.27ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.44ms 3.01ms 4.45ms 100.02 0.00% ✅ PASS
2 2.38ms 2.94ms 4.44ms 100.02 0.00% ✅ PASS
3 2.47ms 3.56ms 5.92ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.63ms 3.37ms 5.04ms 100.00 0.00% ✅ PASS
2 2.58ms 3.6ms 4.84ms 100.00 0.00% ✅ PASS
3 2.66ms 3.86ms 4.97ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

…sure)

Signed-off-by: Harshit Pathak <harshit07pathak@gmail.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.93ms 3.95ms 5.26ms 100.02 0.00% ✅ PASS
2 2.84ms 3.98ms 5.26ms 100.00 0.00% ✅ PASS
3 2.87ms 4.11ms 5.5ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

… RCA

Add fmt.Fprintf(os.Stderr, "TRACE-...") lines to follow each mongo mock
through AddMock and ResolveRange. Pairs with TRACE-MONGO-EMIT in
integrations to pin where the chronic-6 mongo mocks die before reaching
mocks.yaml in go-memory-load-mongo.

TRACE-ADDMOCK-IN: every entry, with mock kind/reqTs/lifetime + firstReqSeen
+ bound/closed state of outChan so we know which branch will fire.
TRACE-ADDMOCK-DROP-CLOSED: outChan-already-closed drop path.
TRACE-ADDMOCK-FORWARD-PREFIRST: pre-firstReqSeen forwarding to outChan.
TRACE-ADDMOCK-BUFFER: buffered path (firstReqSeen + outChan bound).
TRACE-RESOLVE: every ResolveRange call with window/before/after/flushed.
TRACE-RESOLVE-STALECUT: every stale-7s-cutoff drop with mock reqTs.

Diagnostic only. Will be removed once the cause is pinned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Harshit Pathak <harshit07pathak@gmail.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 3.6ms 4.83ms 6.84ms 100.02 0.00% ✅ PASS
2 3.44ms 4.77ms 6.65ms 99.99 0.00% ✅ PASS
3 3.67ms 5.43ms 7.4ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

EmitMock has a pre-AddMock ctx.Err() check at session.go:392. If
sess.Ctx is cancelled during shutdown while the mongo decoder is still
flushing the chronic-6 teardown bytes, the mock returns silently from
this path and never reaches syncMock.AddMock — appearing in our V2
tracing as TRACE-MONGOV2-EMIT-DROP with err="context canceled".

Add TRACE-EMITMOCK-CTXDONE-DROP at the ctx-err return site so we can
see how many mocks die here and at what timestamps, to confirm or
falsify the hypothesis.

Diagnostic only — pure logging, no behavior change. Will be removed
once the cause is pinned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Harshit Pathak <harshit07pathak@gmail.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.53ms 3.32ms 4.88ms 100.02 0.00% ✅ PASS
2 2.5ms 3.38ms 4.66ms 100.00 0.00% ✅ PASS
3 2.52ms 3.51ms 4.98ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Previous pipeline created at 06:54:10 sat in pending state with 0 jobs
for 15+ minutes. The "Prepare Binary and Run Workflows" matrix never
generated — symptom of forge-config fetch timeout at pipeline-create
time per keploy-ci-debug skill.

Empty retrigger forces GitHub to re-evaluate the workflow trigger.
No code change — diagnostic traces from da59f10 remain in effect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Harshit Pathak <harshit07pathak@gmail.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.71ms 3.55ms 4.85ms 100.00 0.00% ✅ PASS
2 2.62ms 3.61ms 5.25ms 100.00 0.00% ✅ PASS
3 2.63ms 3.74ms 5.05ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Run gofmt to fix:
- import order: syncMock should come before pTls alphabetically
- struct field alignment in pendingTC

These were introduced in 00bbba6 (TC hold + agent-side pressure
check) and surfaced in golangci-lint's gofmt step. Pure formatting,
no behavior change.

Also serves to fresh-trigger the prepare-and-run workflow after two
prior runs (26272536927, 26273608297) sat in a GitHub Actions
concurrency hold without generating their job matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Harshit Pathak <harshit07pathak@gmail.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 3.04ms 4ms 5.71ms 100.02 0.00% ✅ PASS
2 3ms 4.23ms 5.92ms 100.02 0.00% ✅ PASS
3 3.07ms 4.45ms 6.27ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.91ms 3.85ms 5.14ms 100.00 0.00% ✅ PASS
2 2.95ms 4.1ms 5.6ms 100.02 0.00% ✅ PASS
3 2.89ms 4ms 5.49ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Chronic-6 RCA confirmed on run 26277486420 (mongo-rbrb):
- last mongo mock at 08:49:20, last HTTP TC at 08:49:41 — 21s gap
- 6 HTTP TCs captured AFTER mongo's async decoder stopped emitting
- replay: connection EOF on get-orders-1..5 + get-analytics-top-products-1

The mongo v2 decoder runs on an async goroutine pipeline (encode.go
asyncMongoDecode). Under memory-limited recording with k6 load, the
pipeline can fall behind HTTP capture by 20+ seconds at shutdown.
The HTTP integration commits TCs as soon as the round-trip completes,
so when the recording window closes, HTTP captures the teardown TCs
but mongo has no time to drain its decode backlog — leaving orphan
TCs whose underlying mongo queries were never persisted to mocks.yaml.

Fix preserves the user's invariant ("no partial mocks; if mock is
dropped the corresponding TC must also be dropped"):

- syncMock.SyncMockManager grows a lastMongoMockResTime field,
  updated under m.mu in AddMock when a Mongo-kind mock arrives.
  Stored as the youngest observed ResTimestampMock so the agent's
  TC-commit path can compare any HTTP TC's req-time against it.

- LastMongoMockResTime() accessor returns the youngest observed time
  (zero if no mongo activity yet). Callers MUST treat zero as "mongo
  not in use" and NOT drop based on it — otherwise pre-handshake HTTP
  TCs in mongo apps would be wrongly dropped before the first mongo
  mock decodes.

- HandleIncoming drain() gains a per-TC orphan check after the
  existing 500ms tcHold + pressure-window check: when LastMongoMock
  is non-zero AND the TC's req-time is more than mongoSilenceTolerance
  (5s) NEWER than the youngest mongo mock, drop the TC and emit
  diag/stage-tc-mongo-silence-drop.

Sized 5s because the normal decoder lag is ~50-200ms under load;
5s comfortably catches the 21s shutdown-orphan window without
dropping legitimate "mongo briefly idle" testcases.

Does NOT touch mongo encode.go / asyncMongoDecode itself — that
async architecture is load-bearing for throughput. The atomicity
guarantee is enforced at the TC commit boundary instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Harshit Pathak <harshit07pathak@gmail.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.74ms 3.56ms 5.08ms 100.00 0.00% ✅ PASS
2 2.71ms 3.68ms 5.14ms 100.00 0.00% ✅ PASS
3 2.76ms 3.82ms 5.08ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Pulls in keploy/integrations debug/window-shift-diagnostic@700907d
which fixes mongo mock-timestamp drift under decoder back-pressure
(summary-17 RCA — find:customers mock 85 ms outside its test window).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Harshit Pathak <harshit07pathak@gmail.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.78ms 3.62ms 5.23ms 100.00 0.00% ✅ PASS
2 2.73ms 3.7ms 5.23ms 100.02 0.00% ✅ PASS
3 2.8ms 4.23ms 5.68ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

The socket-time fix (integrations 700907d) traded one bug for
another:
- FIXED: rbrb summary-17 (find:customers mock outside per-test window)
- REGRESSED: rbrl get/delete-large-payloads-by-id-3 (mocks for these
  TCs went missing, likely a wire vs decoder-pipeline timestamp
  desync that I haven't pinned yet)

Net mongo failure count went 7 → 4, but with explicit regression on
previously-green tests. User preference is zero regressions, so
revert the socket-time stamp change. The orphan-TC drop in
HandleIncoming (still in place, commit 1e79b15 in keploy) remains
the load-bearing fix for the chronic-6 pattern.

Summary-17 becomes an intermittent corner case — needs separate RCA
for the per-test window vs async-decoder-lag interaction, but is not
the systematic chronic-6 issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Harshit Pathak <harshit07pathak@gmail.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.75ms 3.51ms 5.05ms 100.02 0.00% ✅ PASS
2 2.69ms 3.68ms 4.98ms 100.03 0.00% ✅ PASS
3 2.76ms 3.87ms 5.21ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.69ms 3.37ms 4.95ms 100.02 0.00% ✅ PASS
2 2.63ms 3.42ms 4.98ms 100.00 0.00% ✅ PASS
3 2.68ms 3.76ms 5.03ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

…rain

Race in the orphan-TC drop: the existing code reads
syncMock.LastMongoMockResTime() at DRAIN time (after the 500ms
tcHold). In that 500ms window, late mongo mocks for OTHER tests
arrive and refresh the live timestamp. The orphan-check then sees a
fresh value and emits a TC that was actually orphaned at arrival.

Confirmed on run 26281482736 mongo-rbrl post-large-payloads-8:
- TC arrived at 10:10:49.965
- LAST mongo mock at TC arrival: 10:10:43.389 (6.6s gap → orphan)
- After tcHold, fresh mongo mocks at 10:10:49.988+ refreshed the live
  value; drain-time gap was only 22ms → orphan check didn't fire → TC
  committed without its underlying mongo insert → EOF at replay.

Fix: pendingTC now stores the LastMongoMockResTime() value taken at
TC arrival. Drain compares against this frozen value, ignoring any
mocks that landed during the tcHold window. This is the correct
semantics because the orphan condition is "this TC's req-time
arrived after a long gap in mongo decoding" — that's a property of
the moment the TC was captured, not of the moment it's being
emitted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Harshit Pathak <harshit07pathak@gmail.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.35ms 2.94ms 4.23ms 100.02 0.00% ✅ PASS
2 2.31ms 3.14ms 4.2ms 100.02 0.00% ✅ PASS
3 2.32ms 3.23ms 4.43ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

…n MySQL mock drop

Two related atomicity bugs caused TC-without-mock orphans in the
go-memory-load-mongo (rbrb) and go-memory-load-mysql (rbrl) lanes:

Bug 1 — mongo rbrb (delete-large-payloads-by-id-5):
The 1 MB GET response ahead of the DELETE bytes in decodeChan caused
the async decoder to process the delete mock AFTER the 500 ms tcHold
expired. By the time AddMock's backward currentPressureStart extension
fired, the TC had already been committed without its mock.

Fix: track `arrivedDuringPressure` at TC arrival. drain() now holds
such TCs (up to pressureHold=10 s) while pressure is active, giving
async decoders time to emit their mocks and trigger the extension
before the TC is drained.

Bug 2 — mysql rbrl (chronic-6: get-orders-1..5 + get-analytics-1):
MySQL mocks are dropped in recordMock() BEFORE AddMock is called, so
AddMock's in-line backward currentPressureStart extension never fires
for dropped MySQL mocks. IsHTTPTCInPressureWindow saw currentPressureStart
= pressure-fire-time, not the earlier mysql ReqTimestampMock, so TCs
whose round-trip completed just before pressure were not dropped.

Fix: add ExtendPressureWindow(reqTimestamp) to syncMock and call it
from recordMock when dropping, ensuring currentPressureStart is extended
even when AddMock is bypassed.

Also adds IsMemoryPressureActive() to syncMock for drain() to query
whether pressure is still in progress.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Harshit Pathak <harshit07pathak@gmail.com>
@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.74ms 3.57ms 5.12ms 100.02 0.00% ✅ PASS
2 2.69ms 3.77ms 5.03ms 100.00 0.00% ✅ PASS
3 2.74ms 4.03ms 5.41ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.72ms 3.53ms 4.95ms 100.01 0.00% ✅ PASS
2 2.65ms 3.58ms 4.97ms 100.02 0.00% ✅ PASS
3 2.7ms 3.98ms 5.29ms 100.00 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

@github-actions
Copy link
Copy Markdown

🚀 Keploy Performance Test Results

Multi-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.

Run P50 P90 P99 RPS Error Rate Status
1 2.79ms 3.71ms 5.32ms 100.02 0.00% ✅ PASS
2 2.64ms 3.52ms 4.81ms 100.00 0.00% ✅ PASS
3 2.72ms 3.99ms 5.47ms 100.02 0.00% ✅ PASS

Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1%

Result: PASSED - Only 0 out of 3 runs failed (threshold: 2)

P50, P90, and P99 percentiles naturally filter out outliers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants