feat: noise-based obfuscation-aware mock matching across all parsers#4026
feat: noise-based obfuscation-aware mock matching across all parsers#4026slayerjain merged 17 commits intomainfrom
Conversation
Add support for mocks containing obfuscated (redacted) secret values prefixed with `__KEPLOY_REDACT__:`. During replay, obfuscated fields are completely excluded from the match score so they don't affect whether a mock is selected. This enables secret protection without breaking mock matching. Changes: - Add shared ObfuscationPrefix constant and helpers in util/obfuscate.go - ExactBodyMatch: two-pass approach — exact string match first, then JSON-level comparison that skips obfuscated fields - PerformFuzzyMatch: strip obfuscated values from mock bodies before Levenshtein/Jaccard similarity computation - Add Info-level match percentage logging throughout the pipeline (schema match, exact body, body key, fuzzy) - Add 14 unit tests covering scoring, stripping, and edge cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a RecordHooks interface to the OSS record service, mirroring the TestHooks pattern used in replay. This allows enterprise to inject behaviour (e.g. secret obfuscation) into the recording pipeline via Before/AfterTestCaseInsert and Before/AfterMockInsert hooks without wrapping DB interfaces. Includes BaseRecordHooks (embeddable no-op), struct-based context params for forward compatibility, and SetRecordHooks/GetRecordHooks on the Recorder.
Move noise from unstructured Metadata["noise"] JSON string to a typed Noise []string field on MockSpec. This gives the mock matcher a defined structure to read obfuscation patterns from.
Add Noise []string to all per-protocol schema structs (HTTPSchema, GrpcSpec, MongoSpec, DNSSchema, GenericSchema, RedisSchema, KafkaSchema, HTTP2Schema, postgres.Spec, mysql.Spec) and wire it through EncodeMock and DecodeMocks so the field is serialized to/from YAML.
Move the Noise []string field from the per-protocol schema structs and MockSpec up to the Mock struct and NetworkTrafficDoc. This places noise patterns at the YAML root level alongside version/kind/name rather than buried inside each protocol's spec, since noise is mock-level metadata that is protocol-agnostic.
Replace prefix-based obfuscation detection with noise-pattern matching using Mock.Noise regex patterns. This handles all obfuscated character classes (alphanumeric, digit-only, hex) uniformly. Changes: - Rewrite util/obfuscate.go with NoiseChecker type (compile, cache, check) - Rework HTTP parser to use NoiseChecker instead of prefix checks - Add noise check in JSONDiffWithNoiseControl for HTTP/gRPC matchers - Add noise check in MySQL paramValueEqual - Add noise check in Generic findExactMatch/findBinaryMatch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces noise-pattern–based obfuscation awareness for mock matching by persisting per-mock regex “noise” patterns (Mock.Noise) and teaching multiple matchers/parsers to treat matching values as ignorable noise during comparison.
Changes:
- Add
Mock.Noiseand persist it through YAML encoding/decoding (noisefield in mock docs). - Introduce
util.NoiseChecker(+ helpers likeStripNoisyJSON,JSONBodyMatchScore) and thread it into JSON diff and protocol matchers (HTTP/MySQL/Generic). - Add
RecordHooksto allow injecting behavior around test case/mock insertion in the recording pipeline.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/service/record/record.go | Adds RecordHooks to the recorder and calls hook callbacks around inserts. |
| pkg/service/record/hooks.go | Introduces hook interfaces and no-op base implementation. |
| pkg/platform/yaml/yaml.go | Adds persisted noise field to YAML mock document schema. |
| pkg/platform/yaml/mockdb/util.go | Encodes/decodes Mock.Noise into/from YAML documents. |
| pkg/models/mock.go | Adds Noise []string to models.Mock. |
| pkg/matcher/utils.go | Extends JSON diff to accept an obfuscation NoiseChecker and skip noisy values. |
| pkg/matcher/schema/match.go | Updates JSON diff call signature usage. |
| pkg/matcher/http/match.go | Updates JSON diff call signature usage. |
| pkg/matcher/http/absmatch.go | Updates JSON diff call signature usage. |
| pkg/matcher/grpc/match.go | Updates JSON diff call signature usage. |
| pkg/agent/proxy/integrations/util/obfuscate.go | Adds NoiseChecker with cached regex compilation and JSON helpers for stripping/scoring. |
| pkg/agent/proxy/integrations/mysql/replayer/match.go | Skips noisy param values during MySQL statement execute param matching. |
| pkg/agent/proxy/integrations/http/match.go | Implements noise-aware exact/fuzzy HTTP body matching and adds match logging. |
| pkg/agent/proxy/integrations/http/match_test.go | Adds tests for noise-aware JSON scoring, stripping, and exact body matching. |
| pkg/agent/proxy/integrations/generic/match.go | Skips/handles noisy generic payloads in exact/binary matching. |
| cli/provider/core_service.go | Updates recorder constructor call to pass hooks argument. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Noise holds exact-match regex patterns for obfuscated values. | ||
| // During mock matching, any stored value matching a pattern in this | ||
| // list is skipped (treated as noise). Written by the enterprise | ||
| // secret-protection obfuscator. | ||
| Noise []string `json:"Noise,omitempty" bson:"noise,omitempty" yaml:"noise,omitempty"` |
There was a problem hiding this comment.
Mock.Noise is added, but Mock.DeepCopy() currently doesn't copy the Noise slice, so any code that deep-copies mocks (e.g., to avoid races) will silently drop noise patterns and lose obfuscation-aware matching. Update DeepCopy to deep-copy Noise (and include it in the returned Mock).
| nc := util.NewNoiseChecker(mock.Noise) | ||
| for requestIndex, reqBuff := range reqBuffs { | ||
| mockData := mock.Spec.GenericRequests[requestIndex].Message[0].Data | ||
|
|
||
| // If mock data is noisy (obfuscated), give it a perfect similarity score | ||
| if nc != nil && nc.IsNoisy(mockData) { | ||
| if 1.0 > mxSim { | ||
| mxSim = 1.0 | ||
| mxIdx = idx | ||
| } | ||
| continue | ||
| } |
There was a problem hiding this comment.
In findBinaryMatch, treating a single noisy mock buffer as a perfect similarity (mxSim=1.0) can cause this mock to win even if other request buffers are completely different, since matching uses a global max similarity across all buffers. Instead, skip noisy buffers in the similarity calculation (or aggregate similarity across all non-noisy buffers) so noise doesn't force an unconditional best match.
| h.Logger.Info("http mock schema match results", | ||
| zap.Int("schema_matched", len(schemaMatched)), | ||
| zap.Int("total_http_mocks", len(unfilteredMocks))) |
There was a problem hiding this comment.
These Info-level logs run for every request match attempt and will likely flood logs in normal operation. Consider downgrading to Debug (or gating behind a verbose flag) to avoid high log volume and performance impact.
| matched, total, noisy := util.JSONBodyMatchScore(mockData, reqData, nc) | ||
|
|
||
| pct := 100.0 | ||
| if total > 0 { | ||
| pct = float64(matched) / float64(total) * 100 | ||
| } | ||
| h.Logger.Info("http mock match score (noise-aware)", | ||
| zap.String("mock", mock.Name), | ||
| zap.Int("matched_fields", matched), | ||
| zap.Int("total_fields", total), | ||
| zap.Int("noisy_fields_skipped", noisy), | ||
| zap.Float64("match_percentage", pct)) | ||
|
|
||
| if matched == total { | ||
| return true, mock | ||
| } |
There was a problem hiding this comment.
The noise-aware ExactBodyMatch uses JSONBodyMatchScore and then treats matched==total as an exact match, but the score only iterates over keys present in the mock JSON. This allows requests with extra non-noisy fields to still be considered an "exact" body match. If this is meant to be exact equality ignoring noisy fields, ensure the comparison also fails on extra request keys (except those corresponding to skipped noisy fields), e.g., by stripping noisy fields from both sides and doing a deep equality check.
| h.Logger.Info("http mock match score (noise-aware)", | ||
| zap.String("mock", mock.Name), | ||
| zap.Int("matched_fields", matched), | ||
| zap.Int("total_fields", total), | ||
| zap.Int("noisy_fields_skipped", noisy), | ||
| zap.Float64("match_percentage", pct)) |
There was a problem hiding this comment.
Logging a per-mock noise-aware match score at Info inside the match loop can be extremely noisy and expensive at runtime (N mocks per request). Consider changing this to Debug (or only logging the winning mock / when no match is found) to reduce operational overhead.
| if hookErr := r.hooks.AfterTestCaseInsert(ctx, &TestCaseContext{ | ||
| TestCase: testCase, TestSetID: newTestSetID, | ||
| }); hookErr != nil { | ||
| r.logger.Error("AfterTestCaseInsert hook failed", zap.Error(hookErr)) | ||
| } |
There was a problem hiding this comment.
Hook failure is logged as an error but doesn't provide any actionable next step or context (e.g., which test case failed). Consider including identifiers (test case name/ID, testSetID) and a next-step hint (e.g., "disable custom record hooks" or "check enterprise hook implementation logs").
| if hookErr := r.hooks.BeforeMockInsert(ctx, &MockContext{ | ||
| Mock: mock, TestSetID: newTestSetID, | ||
| }); hookErr != nil { | ||
| r.logger.Error("BeforeMockInsert hook failed", zap.Error(hookErr)) | ||
| } |
There was a problem hiding this comment.
Hook failure is logged as an error but doesn't provide any actionable next step or context (e.g., which mock failed). Consider including identifiers (mock name/kind, testSetID) and a next-step hint (e.g., "disable custom record hooks" or "check enterprise hook implementation logs").
| if hookErr := r.hooks.AfterMockInsert(ctx, &MockContext{ | ||
| Mock: mock, TestSetID: newTestSetID, | ||
| }); hookErr != nil { | ||
| r.logger.Error("AfterMockInsert hook failed", zap.Error(hookErr)) | ||
| } |
There was a problem hiding this comment.
Hook failure is logged as an error but doesn't provide any actionable next step or context (e.g., which mock failed). Consider including identifiers (mock name/kind, testSetID) and a next-step hint (e.g., "disable custom record hooks" or "check enterprise hook implementation logs").
| compiled, err := regexp.Compile(pattern) | ||
| if err != nil { | ||
| return nil // skip invalid patterns silently | ||
| } |
There was a problem hiding this comment.
getCachedRegexp silently drops invalid regex patterns (returns nil) with no surfaced error, which can make "noise" mismatches very hard to diagnose. Consider returning an error (or at least collecting/reporting invalid patterns via a debug log or counter) so misconfigured Mock.Noise patterns are visible to operators.
| h.Logger.Info("http mock body key match results", | ||
| zap.Int("body_key_matched", len(bodyMatched)), | ||
| zap.Int("schema_matched", len(schemaMatched))) |
There was a problem hiding this comment.
This Info-level aggregate log is emitted on every JSON-body match pass and may flood logs under load. Consider changing to Debug (or only logging when multiple candidates remain / when matching fails) to keep production logs actionable.
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
…pipeline Point the build-and-upload and build-docker-image jobs to the integrations branch with obfuscation-aware parser changes so the CI pipeline tests the full stack together. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
Blank lines after Metadata break gofmt's column alignment group, causing the CI lint check to fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
- DeepCopy now copies the Noise slice to prevent race conditions - findBinaryMatch aggregates similarity across non-noisy buffers instead of forcing perfect score on noisy ones - Downgrade per-request match logs from Info to Debug to reduce noise - ExactBodyMatch now rejects requests with extra non-noisy keys - Hook failure logs include testSetID, name, and kind for debugging - getCachedRegexp warns on invalid regex patterns instead of silently dropping them Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| compiled, err := regexp.Compile(pattern) | ||
| if err != nil { | ||
| log.Printf("WARNING: invalid noise regex pattern %q: %v — pattern will be ignored", pattern, err) | ||
| return nil |
There was a problem hiding this comment.
getCachedRegexp logs via log.Printf("WARNING: ...") when a noise regex is invalid. This adds a warning-style log line (and uses the stdlib logger) without a clear next step for users. Consider returning/propagating the compile error to the caller so it can be logged via the existing zap logger with actionable guidance (e.g., which config/mock produced the pattern), or silently ignoring invalid patterns if they’re not user-actionable.
| case []interface{}: | ||
| rv, ok := reqVal.([]interface{}) | ||
| if !ok { | ||
| return false | ||
| } | ||
| for i := 0; i < len(mv) && i < len(rv); i++ { | ||
| if nc.IsNoisyValue(mv[i]) { | ||
| continue | ||
| } | ||
| if HasExtraNonNoisyKeys(mv[i], rv[i], nc) { | ||
| return true | ||
| } | ||
| } | ||
| return false |
There was a problem hiding this comment.
HasExtraNonNoisyKeys doesn’t treat extra elements in request arrays as “extra non-noisy keys”. In the []interface{} case it only compares up to min(len(mv), len(rv)) and then returns false, so a request like [1,2,3] can be considered “exact” against a mock [1,2] (when matched == total). Consider adding a length check (e.g., if len(rv) > len(mv) then return true) so arrays with additional request elements don’t incorrectly pass as exact matches.
| if hookErr := r.hooks.BeforeTestCaseInsert(ctx, &TestCaseContext{ | ||
| TestCase: testCase, TestSetID: newTestSetID, | ||
| }); hookErr != nil { | ||
| r.logger.Error("BeforeTestCaseInsert hook failed", | ||
| zap.Error(hookErr), | ||
| zap.String("testSetID", newTestSetID), | ||
| zap.String("testCaseName", testCase.Name)) | ||
| } |
There was a problem hiding this comment.
Hook error logs (e.g., BeforeTestCaseInsert hook failed) don’t provide any actionable next step for the user, even though recording continues after the failure. Consider including guidance such as how to disable the hook/feature or where to look (enterprise hook implementation) and whether the failure affects recording results, so users know what to do when they see this error.
| if hookErr := r.hooks.AfterTestCaseInsert(ctx, &TestCaseContext{ | ||
| TestCase: testCase, TestSetID: newTestSetID, | ||
| }); hookErr != nil { | ||
| r.logger.Error("AfterTestCaseInsert hook failed", | ||
| zap.Error(hookErr), | ||
| zap.String("testSetID", newTestSetID), | ||
| zap.String("testCaseName", testCase.Name)) | ||
| } |
There was a problem hiding this comment.
Hook error logs (e.g., AfterTestCaseInsert hook failed) don’t provide an actionable next step and don’t indicate whether the test case was still recorded successfully. Consider enriching the log message/fields to clarify impact (insert succeeded vs not) and provide guidance on resolving or disabling the hook implementation.
| if hookErr := r.hooks.AfterMockInsert(ctx, &MockContext{ | ||
| Mock: mock, TestSetID: newTestSetID, | ||
| }); hookErr != nil { | ||
| r.logger.Error("AfterMockInsert hook failed", | ||
| zap.Error(hookErr), | ||
| zap.String("testSetID", newTestSetID), | ||
| zap.String("mockName", mock.Name), | ||
| zap.String("mockKind", mock.GetKind())) | ||
| } |
There was a problem hiding this comment.
Hook error logs (e.g., AfterMockInsert hook failed) don’t provide an actionable next step and don’t clarify whether the mock insert succeeded (it appears to, since this is in the success branch). Consider adjusting the message/fields to state the insert result and include guidance for fixing or disabling the hook implementation so users can act on the error.
| // First pass: exact string match (fastest path) | ||
| for _, mock := range schemaMatched { | ||
| if mock.Spec.HTTPReq.Body == string(body) { | ||
| h.Logger.Info("http mock matched", | ||
| zap.String("mock", mock.Name), | ||
| zap.Float64("match_percentage", 100.0), | ||
| zap.String("match_type", "exact_body")) | ||
| return true, mock |
There was a problem hiding this comment.
These new Info logs on every successful HTTP mock match (http mock matched) may be very high-volume in normal operation (proxy matching runs per request) and could noticeably increase log noise/cost. Consider downgrading to Debug (or gating behind a verbose flag), keeping Info for only user-actionable lifecycle events.
| - name: Add Private Parsers | ||
| if: ${{ (github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork) || (github.event_name == 'push' && github.ref == 'refs/heads/main') }} | ||
| uses: ./.github/actions/setup-private-parsers | ||
| with: | ||
| ssh-private-key: ${{ secrets.INTEGRATIONS_REPO_DEPLOY_KEY_PRIVATE }} | ||
| go-cache: true | ||
| integration-ref: feat/obfuscation-aware-matching | ||
|
|
There was a problem hiding this comment.
CI is pinned to a feature branch of the private parsers repo via integration-ref: feat/obfuscation-aware-matching. This can break builds when the branch is force-pushed/deleted and makes CI results less reproducible. Consider pinning to a tag/commit SHA (or omitting integration-ref to use the default) and updating it through a controlled release process.
| - name: Add Private Parsers | ||
| if: ${{ (github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork) || (github.event_name == 'push' && github.ref == 'refs/heads/main') }} | ||
| uses: ./.github/actions/setup-private-parsers | ||
| with: | ||
| ssh-private-key: ${{ secrets.INTEGRATIONS_REPO_DEPLOY_KEY_PRIVATE }} | ||
| go-cache: true | ||
| integration-ref: feat/obfuscation-aware-matching | ||
|
|
There was a problem hiding this comment.
This second integration-ref: feat/obfuscation-aware-matching occurrence also pins CI to a moving branch for the docker-image build job. Consider using the same pinned tag/SHA approach here as well so both jobs build against an immutable parser revision.
- Remove stdlib log.Printf from getCachedRegexp; silently skip invalid regex patterns since they are not user-actionable - Add array length check in HasExtraNonNoisyKeys so requests with extra array elements don't pass as exact matches - Add actionable guidance and impact clarity to all hook error logs - Downgrade all "http mock matched" Info logs to Debug to reduce per-request log volume - Pin CI integration-ref to commit SHA instead of moving branch name Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Ayush Sharma <kshitij3160@gmail.com>
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Build set of mock keys (excluding noisy ones) | ||
| mockKeys := make(map[string]struct{}, len(mv)) | ||
| for k, v := range mv { | ||
| if !nc.IsNoisyValue(v) { | ||
| mockKeys[k] = struct{}{} | ||
| } | ||
| } | ||
| for k := range rv { | ||
| if _, exists := mockKeys[k]; !exists { | ||
| return true | ||
| } | ||
| } |
There was a problem hiding this comment.
HasExtraNonNoisyKeys treats any request key not in mockKeys as “extra”, but mockKeys is built by excluding keys whose mock value is noisy. This makes common cases like {password: <noisy>} in the mock be considered an extra key in the request (since password is excluded from mockKeys), causing otherwise-valid noise-aware exact matches to be rejected. Consider treating keys as present regardless of noise (only skip value comparisons/recursion for noisy fields), or include noisy keys in the allowed key set while still ignoring their values.
| case []interface{}: | ||
| var result []interface{} | ||
| for _, item := range v { | ||
| if nc.IsNoisyValue(item) { | ||
| continue | ||
| } | ||
| result = append(result, StripNoisyFields(item, nc)) | ||
| } | ||
| return result |
There was a problem hiding this comment.
StripNoisyFields uses a nil []interface{} result when all array elements are stripped. encoding/json marshals a nil slice as null, which can change semantics (and similarity scoring) compared to an empty array []. Initializing result with make([]interface{}, 0, len(v)) avoids emitting null for empty arrays.
| // String-based fuzzy matching (Levenshtein distance) | ||
| reqStr := string(reqBuff) | ||
| if util.IsASCII(reqStr) { | ||
| idx := h.findStringMatch(reqStr, mockStrings) | ||
| if idx != -1 { | ||
| h.Logger.Debug("string match found", zap.String("mock name", tcsMocks[idx].Name)) | ||
| dist := levenshtein.ComputeDistance(reqStr, mockStrings[idx]) | ||
| maxLen := len(reqStr) | ||
| if len(mockStrings[idx]) > maxLen { | ||
| maxLen = len(mockStrings[idx]) | ||
| } | ||
| pct := 0.0 | ||
| if maxLen > 0 { | ||
| pct = (1.0 - float64(dist)/float64(maxLen)) * 100 | ||
| } |
There was a problem hiding this comment.
PerformFuzzyMatch recomputes Levenshtein distance (ComputeDistance) for logging even though findStringMatch already computed distances while selecting the best match. This adds extra O(n*m) work on the hot path purely for debug output. Consider returning the distance from findStringMatch (or computing the percentage only when debug is enabled) to avoid the duplicate computation.
| // JSONDiffWithNoiseControl compares JSON with support for both Path-based noise (e.g. "body.user.id") | ||
| // and Global noise (e.g. "timestamp") to be ignored everywhere. | ||
| func JSONDiffWithNoiseControl(validatedJSON ValidatedJSON, noise map[string][]string, ignoreOrdering bool) (JSONComparisonResult, error) { | ||
| func JSONDiffWithNoiseControl(validatedJSON ValidatedJSON, noise map[string][]string, ignoreOrdering bool, obfuscationNoise *util.NoiseChecker) (JSONComparisonResult, error) { | ||
| // Split noise into Path-based (contains dots) and Global (no dots) | ||
| pathNoise := make(map[string][]string) |
There was a problem hiding this comment.
JSONDiffWithNoiseControl now accepts obfuscationNoise, but all in-repo callers pass nil (HTTP matcher, gRPC matcher, schema matcher). As a result, the new obfuscation-aware branch in matchJSONWithNoiseHandlingIndexed is currently unused in this codebase. If the PR intends to make matchers obfuscation-aware, plumb a real NoiseChecker from the relevant source; otherwise consider removing this parameter to avoid dead/untested code paths.
| continue | ||
| } | ||
|
|
||
| _ = base64.StdEncoding.EncodeToString(reqBuff) | ||
| encoded, _ := util.DecodeBase64(mock.Spec.GenericRequests[requestIndex].Message[0].Data) | ||
| encoded, _ := util.DecodeBase64(mockData) |
There was a problem hiding this comment.
_ = base64.StdEncoding.EncodeToString(reqBuff) is a no-op (result unused) and can be removed. Leaving it in looks like leftover debugging and makes the inner matching loop harder to read.
- HasExtraNonNoisyKeys: include noisy keys in allowed key set so requests with noisy fields are not falsely rejected (fixes TestExactBodyMatch_NoisyFullMatch) - StripNoisyFields: initialize empty slice instead of nil to avoid marshaling as null - findStringMatch: return distance alongside index to eliminate duplicate Levenshtein computation in PerformFuzzyMatch - Remove unused obfuscationNoise param from JSONDiffWithNoiseControl and all callers — obfuscation-aware matching lives in proxy layer - Remove no-op base64 encoding in generic/match.go - Remove integration-ref from CI workflow to match main branch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| dist := levenshtein.ComputeDistance(req, mock) | ||
| if dist == 0 { | ||
| return 0 | ||
| return 0, 0 | ||
| } |
There was a problem hiding this comment.
In findStringMatch, when an exact string match is found (dist == 0), the function currently returns index 0 regardless of which mock string matched. This can select the wrong mock if the exact match occurs at a later index (especially after noise stripping). Return the current idx (and distance 0) instead of hardcoding 0.
| for idx := range tcsMocks { | ||
| mockBody := []byte(mockStrings[idx]) | ||
| k := util.AdaptiveK(len(reqBuff), 3, 8, 5) | ||
| shingles1 := util.CreateShingles(mockBody, k) | ||
| shingles2 := util.CreateShingles(reqBuff, k) | ||
| similarity := util.JaccardSimilarity(shingles1, shingles2) | ||
| if mxSim < similarity { |
There was a problem hiding this comment.
PerformFuzzyMatch's binary (Jaccard) loop recomputes request shingles (CreateShingles(reqBuff, k)) for every mock. Since reqBuff and k are constant across iterations, precomputing the request shingles once outside the loop would avoid repeated work and reduce CPU for large bodies / many mocks.
| // Global cache for compiled regexes to avoid recompiling the same patterns | ||
| // across multiple mock comparisons. | ||
| var ( | ||
| noiseCacheMu sync.RWMutex | ||
| noiseCache = make(map[string]*regexp.Regexp) | ||
| ) |
There was a problem hiding this comment.
noiseCache is a global map with no eviction/size bound. Since cache keys are the raw regex patterns (likely derived from recorded mocks), long-running processes that load many distinct mocks/patterns can grow this map unbounded and retain memory permanently. Consider bounding the cache (LRU/TTL) or scoping it to a replay/recording session instead of using a process-wide map.
- findStringMatch: return actual idx on exact match instead of hardcoded 0, which could select the wrong mock - PerformFuzzyMatch: precompute request shingles once outside the Jaccard loop to avoid redundant O(n) work per mock - Bound noiseCache to 1024 entries with full eviction to prevent unbounded memory growth in long-running processes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // JSON-level comparison skipping noisy fields | ||
| if !pkg.IsJSON([]byte(mockBody)) || !pkg.IsJSON(body) { | ||
| continue | ||
| } | ||
|
|
||
| var mockData, reqData interface{} | ||
| if err := json.Unmarshal([]byte(mockBody), &mockData); err != nil { | ||
| continue | ||
| } | ||
| if err := json.Unmarshal(body, &reqData); err != nil { | ||
| continue | ||
| } |
There was a problem hiding this comment.
ExactBodyMatch does JSON parsing of the request body inside the per-mock loop (pkg.IsJSON + json.Unmarshal of body each iteration). This is O(#mocks) re-parsing and can become a noticeable hot path when many mocks share the same schema. Parse/validate the request JSON once before the loop and reuse reqData (and an isReqJSON bool) while iterating mocks.
| // Binary fuzzy matching (Jaccard similarity) with stripped mock bodies | ||
| mxSim := -1.0 | ||
| mxIdx := -1 | ||
| k := util.AdaptiveK(len(reqBuff), 3, 8, 5) | ||
| reqShingles := util.CreateShingles(reqBuff, k) | ||
| for idx := range tcsMocks { | ||
| mockBody := []byte(mockStrings[idx]) | ||
| mockShingles := util.CreateShingles(mockBody, k) | ||
| similarity := util.JaccardSimilarity(mockShingles, reqShingles) | ||
| if mxSim < similarity { | ||
| mxSim = similarity | ||
| mxIdx = idx | ||
| } | ||
| } |
There was a problem hiding this comment.
PerformFuzzyMatch now duplicates the Jaccard-similarity loop that already exists in findBinaryMatch, but against mockStrings (noise-stripped). This duplication makes it easier for the two implementations to drift. Consider refactoring so the binary fuzzy match logic lives in one place (e.g., update findBinaryMatch to accept the preprocessed mock bodies / NoiseChecker and call it here).
| func getCachedRegexp(pattern string) *regexp.Regexp { | ||
| noiseCacheMu.RLock() | ||
| re := noiseCache[pattern] | ||
| noiseCacheMu.RUnlock() | ||
| if re != nil { | ||
| return re | ||
| } | ||
| compiled, err := regexp.Compile(pattern) | ||
| if err != nil { | ||
| return nil // invalid pattern — silently skipped; not user-actionable | ||
| } |
There was a problem hiding this comment.
getCachedRegexp recompiles the same invalid regex pattern on every call because failures aren’t cached (it returns nil immediately). If an invalid pattern appears in Mock.Noise, this can become a repeated compile cost during matching. Consider caching a negative result (e.g., store a sentinel) or filtering/validating patterns once so subsequent checks don’t re-run regexp.Compile.
| // HasExtraNonNoisyKeys checks whether reqVal contains keys not present in | ||
| // mockVal (excluding keys whose mock value is noisy). Returns true if extra | ||
| // non-noisy keys exist, meaning the request is not an exact match. | ||
| func HasExtraNonNoisyKeys(mockVal, reqVal interface{}, nc *NoiseChecker) bool { | ||
| switch mv := mockVal.(type) { | ||
| case map[string]interface{}: | ||
| rv, ok := reqVal.(map[string]interface{}) | ||
| if !ok { | ||
| return false | ||
| } | ||
| // Build set of all mock keys — noisy keys are still valid keys, | ||
| // we only skip their value comparison, not their presence. | ||
| mockKeys := make(map[string]struct{}, len(mv)) |
There was a problem hiding this comment.
The doc comment for HasExtraNonNoisyKeys says it excludes keys whose mock value is noisy, but the implementation explicitly treats noisy keys as valid keys for presence checks (and only skips value recursion). Please align the comment and behavior to avoid confusion for future maintainers (either update the comment or adjust the key-handling logic).
- ExactBodyMatch: parse request JSON once before the noise-aware loop instead of re-parsing per mock - Extract jaccardBestMatch helper to deduplicate Jaccard similarity logic between findBinaryMatch and PerformFuzzyMatch - getCachedRegexp: cache invalid patterns as nil entries to avoid repeated regexp.Compile calls for the same bad pattern Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| nc := util.NewNoiseChecker(mock.Noise) | ||
| var simSum float64 | ||
| var simCount int | ||
| for requestIndex, reqBuff := range reqBuffs { | ||
| _ = base64.StdEncoding.EncodeToString(reqBuff) | ||
| encoded, _ := util.DecodeBase64(mock.Spec.GenericRequests[requestIndex].Message[0].Data) | ||
| mockData := mock.Spec.GenericRequests[requestIndex].Message[0].Data | ||
|
|
||
| similarity := fuzzyCheck(encoded, reqBuff) | ||
| // Skip noisy (obfuscated) buffers — don't let them influence similarity | ||
| if nc != nil && nc.IsNoisy(mockData) { | ||
| continue | ||
| } | ||
|
|
||
| if mxSim < similarity { | ||
| mxSim = similarity | ||
| encoded, _ := util.DecodeBase64(mockData) | ||
|
|
||
| similarity := fuzzyCheck(encoded, reqBuff) | ||
| simSum += similarity | ||
| simCount++ | ||
| } | ||
| // Compute average similarity across non-noisy buffers | ||
| if simCount > 0 { | ||
| avgSim := simSum / float64(simCount) | ||
| if avgSim > mxSim { |
There was a problem hiding this comment.
If all buffers for a mock are marked noisy, simCount stays 0 and the mock is never considered (even though, conceptually, it may be a valid match when everything is obfuscated). Handle the simCount == 0 case explicitly (e.g., treat it as a perfect/neutral similarity or fall back to a different tie-breaker) so fully-redacted generic interactions remain matchable.
| compiled, err := regexp.Compile(pattern) | ||
| if err != nil { | ||
| compiled = nil // will be cached as negative result | ||
| } |
There was a problem hiding this comment.
Invalid noise regex patterns are silently cached as nil and then ignored by NewNoiseChecker, which can disable obfuscation-awareness without any visibility (leading to surprising mismatches in production). Consider returning an error (or collecting invalid patterns) from NewNoiseChecker, or at least emitting a one-time log/metric when a pattern fails to compile so operators can detect misconfigured Mock.Noise.
| // HasExtraNonNoisyKeys checks whether reqVal contains keys not present in | ||
| // mockVal (excluding keys whose mock value is noisy). Returns true if extra | ||
| // non-noisy keys exist, meaning the request is not an exact match. | ||
| func HasExtraNonNoisyKeys(mockVal, reqVal interface{}, nc *NoiseChecker) bool { |
There was a problem hiding this comment.
The doc comment doesn’t match the implementation: the function always treats mock keys as present regardless of whether their values are noisy (it skips value comparison/recursion for noisy values, but not key presence). Update the comment to reflect actual behavior (e.g., ‘extra keys not present in the mock cause mismatch; nested extras under noisy branches are ignored’) to prevent incorrect usage/assumptions by future callers.
- findBinaryMatch: treat fully-noisy mocks as neutral (1.0) so they remain matchable when all buffers are obfuscated - NewNoiseChecker: log invalid regex patterns at construction time so operators can detect misconfigured Mock.Noise - HasExtraNonNoisyKeys: update doc comment to accurately reflect that all mock keys are present regardless of noise; only value comparison is skipped for noisy fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
🚀 Keploy Performance Test ResultsMulti-Run Validation: Tests run 3 times, pipeline fails only if 2+ runs show regression.
Thresholds: P50 < 5ms, P90 < 15ms, P99 < 70ms, RPS >= 100 (±1% tolerance), Error Rate < 1% ✅ Result: PASSED - Only 0 out of 3 runs failed (threshold: 2) P50, P90, and P99 percentiles naturally filter out outliers |
Summary
IsObfuscated,ContainsObfuscatedValue) with noise-pattern matching usingMock.Noiseregex patternsNoiseCheckertype inutil/obfuscate.gowith compile/cache/check methods that handles all obfuscated character classes (alphanumeric, digit-only, hex) uniformlyJSONDiffWithNoiseControl), MySQLparamValueEqual, and Generic binary match to use noise-based detectionmatchJSONWithNoiseHandlingIndexedchangeParsers changed
ExactBodyMatch,PerformFuzzyMatchto useNoiseCheckerfrommock.NoiseJSONDiffWithNoiseControlaccepts*NoiseChecker, covers HTTP Matcher + gRPC MatcherparamValueEqualskips noisy mock param valuesfindExactMatch/findBinaryMatchskip noisy mock buffersRelated PRs
Test plan
JSONBodyMatchScore,StripNoisyJSON,ExactBodyMatchgo build ./...passes🤖 Generated with Claude Code