Skip to content

Make e2e tests deterministic for provider-proxy cache#7248

Closed
AntoineToussaint wants to merge 6 commits intomainfrom
fix-deterministic-e2e-tests
Closed

Make e2e tests deterministic for provider-proxy cache#7248
AntoineToussaint wants to merge 6 commits intomainfrom
fix-deterministic-e2e-tests

Conversation

@AntoineToussaint
Copy link
Copy Markdown
Member

@AntoineToussaint AntoineToussaint commented Apr 8, 2026

Summary

Make e2e tests deterministic so that provider-proxy cache keys are the same across test runs. This is a prerequisite for enabling read-only-require-hit mode in CI (#7205).

Changes

Random values in prompts → fixed strings

  • providers/common.rs: Remove Uuid::now_v7() from bad auth test prompt
  • providers/anthropic.rs: Remove Uuid::now_v7() from thinking test prompt
  • aggregated_response/mod.rs: Remove UUID suffix from streaming test prompt
  • otel.rs: Use fixed string for OTEL tag value instead of UUID
  • test_client.py: Remove uuid7() from extra headers test prompt

Embeddings cache tests → use write_only mode

  • raw_response/embeddings.rs: Use cache_options.enabled = "write_only" for the first request to bypass Valkey reads while still populating cache for the second request
  • test_embeddings.py: Same fix for Python tests

Image URL tests → fixed ports

  • image_url.rs: Use fixed port 19876 instead of OS-assigned port 0
  • providers/common.rs: Use fixed port 19877 for image server

GitHub file fetches → bypass proxy

  • http.rs: Add raw.githubusercontent.com to no_proxy list (file fetches, not provider API calls)

Audit of remaining random values

There are ~1200 remaining Uuid::now_v7() calls in e2e tests. None of them affect provider-proxy cache keys because they are all:

  • episode_id / inference_id: TensorZero metadata, never included in the request body sent to LLM providers
  • Used with dummy:: models: Don't go through the provider-proxy at all
  • In feedback/dataset/rendering tests: No LLM provider calls involved

Test plan

  • cargo clippy passes
  • All pre-commit hooks pass
  • CI passes

🤖 Generated with Claude Code

AntoineToussaint and others added 4 commits April 8, 2026 18:30
Replace random values (UUIDs, random integers) in LLM provider request
bodies with fixed strings. This makes provider-proxy cache keys
deterministic across test runs, enabling read-only cache mode in CI.

Changes:
- providers/common.rs: Remove UUID from bad auth test prompt
- providers/anthropic.rs: Remove UUID from thinking test prompt
- aggregated_response/mod.rs: Remove UUID suffix from prompt
- otel.rs: Use fixed string for tag value
- raw_response/embeddings.rs: Use fixed string for embedding input
- test_client.py: Remove UUID from extra headers test prompt
- test_embeddings.py: Use fixed strings for cache test inputs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These tests need random inputs to avoid hitting the internal Valkey
cache from previous runs. The provider-proxy cache will need body
sanitization to handle these — they can't be made deterministic
without breaking the cache-testing logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use cache_options.enabled = "write_only" for the first request to
bypass Valkey cache reads while still populating the cache for the
second request. This makes the tests deterministic without needing
random input text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use fixed ports (19876, 19877) for test image servers instead of
  OS-assigned port 0, making image URLs in provider requests deterministic
- Add raw.githubusercontent.com to no_proxy list so image fetches
  bypass the provider-proxy (they're not provider API calls)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tags: HashMap::from([
("first_tag".to_string(), "first_value".into()),
("second_tag".to_string(), "second_value".into()),
("user_id".to_string(), Uuid::now_v7().to_string()),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be needed, since the tags don't get included in model inference requests

@Aaron1011
Copy link
Copy Markdown
Member

Would you mind splitting out the changes to tensorzero cache tests (e.g. test_embeddings_raw_response_with_cache) into a separate PR? I'm a little nervous about losing some of our cache test coverage, but the other stuff is good to merge.

AntoineToussaint and others added 2 commits April 9, 2026 10:23
- Use separate payload with "enabled": "on" for the second (cached)
  request in embeddings test (first request uses "write_only")
- Revert providers/common.rs image server to port 0 (uses fetch=true,
  so URL doesn't reach provider)
- Use port 0 for fetch_true image test, fixed port 19876 only for
  fetch_false (where URL IS in the provider request)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per review feedback, revert the embeddings cache test changes
(write_only approach) so they can be reviewed separately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants