Skip to content

Improve provider-proxy cache determinism for read-only mode#7234

Open
AntoineToussaint wants to merge 4 commits intoaaron/provider-proxy-require-cachefrom
ci-read-only-proxy-cache
Open

Improve provider-proxy cache determinism for read-only mode#7234
AntoineToussaint wants to merge 4 commits intoaaron/provider-proxy-require-cachefrom
ci-read-only-proxy-cache

Conversation

@AntoineToussaint
Copy link
Copy Markdown
Member

Summary

Built on top of #7205. Reduces cache misses when running in read-only-require-hit mode:

  • Body sanitization: Normalize UUIDs and random localhost ports in request bodies before computing cache keys (--sanitize-body flag, on by default). This makes cache keys deterministic across test runs where tests embed random values in prompts.
  • Cache miss manifest: Write a cache-misses.json file with details of every cache miss (--cache-miss-manifest flag). Useful for debugging which tests bypass the cache.
  • Bypass proxy for raw.githubusercontent.com: Added to the no_proxy list since these are file fetches, not provider API calls.

What this fixes

From the failure analysis of #7205:

  • Tests that embed random UUIDs in prompts (e.g. "...something went wrong: 019d641f-3802...") → fixed by UUID normalization
  • Tests that use 127.0.0.1:<random_port> in image URLs → fixed by port normalization
  • Tests that fetch images from raw.githubusercontent.comfixed by bypassing proxy

Test plan

  • cargo clippy passes for provider-proxy and tensorzero-core
  • CI runs with read-only-require-hit mode show fewer cache misses
  • Cache miss manifest file is written correctly

🤖 Generated with Claude Code

- Add body sanitization: normalize UUIDs and random localhost ports in
  request bodies before computing cache keys (--sanitize-body flag)
- Add cache miss manifest: write cache-misses.json with details of every
  cache miss for debugging (--cache-miss-manifest flag)
- Bypass proxy for raw.githubusercontent.com (not a provider API)

These changes reduce cache misses when running in read-only-require-hit
mode by making cache keys deterministic across test runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AntoineToussaint and others added 2 commits April 8, 2026 14:06
Changing the hash algorithm invalidates all existing cache entries.
Default to false so the existing R2 cache remains valid. Enable
after repopulating the cache with the new algorithm.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Body sanitization (UUID/port normalization) invalidates the entire
existing R2 cache. Defer that to a follow-up with coordinated cache
repopulation. This commit keeps only:
- Cache miss manifest (--cache-miss-manifest flag)
- raw.githubusercontent.com in no_proxy list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Normalize UUIDs and random localhost ports in request bodies before
computing cache keys. This makes cache keys deterministic across test
runs where tests embed random values in prompts or image URLs.

Enabled by default (--sanitize-body=true). This will invalidate the
existing R2 cache — a one-time cache repopulation in write mode is
needed after merging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant