Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

src/eval/adapters

Concrete LLMClient adapters for the eval harness. The judge in src/eval/judge.py calls an LLMClient Protocol — never a vendor SDK directly. Each adapter in this package implements that Protocol for one provider, so the eval core stays vendor-neutral and a downstream consumer can swap providers by changing one wiring line in their test fixture.

Key interfaces

Exported from this package:

  • AzureOpenAIClient — implements src.eval.judge.LLMClient. Construct from env via AzureOpenAIClient(); call complete(prompt) for runner answer_fn use, complete_json(*, model, prompt) for judge use. The model argument on complete_json is accepted for Protocol conformance and discarded — Azure addresses by deployment name (set at construction time, read from AZURE_OPENAI_DEPLOYMENT).
  • AzureOpenAIConfigError — raised at construction when required env is missing or the optional openai extra is not installed. Subclass of RuntimeError. The error message names every missing env var in one go so the caller doesn't have to fix-and-retry.

Why this layer exists

Without the Protocol seam, swapping LLM providers would mean touching the eval core. With it, vendor lock-in is confined to one file per provider. The layer demonstrates that the harness's "provider-agnostic" claim is structural, not aspirational: the eval core has zero imports of any vendor SDK.

Current adapters

File Provider Optional extra Env contract
azure_openai.py Azure OpenAI uv sync --extra eval AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT, optional AZURE_OPENAI_API_VERSION (default 2024-10-21)

Adding a new adapter

  1. Add the SDK to [project.optional-dependencies] in pyproject.toml — either to the existing eval extra or a new provider-scoped one.
  2. Add the SDK's top-level module to [[tool.mypy.overrides]] with ignore_missing_imports = true, matching the existing openai.* / opentelemetry.* entries. This keeps mypy clean on stock uv sync --extra dev checkouts.
  3. Implement complete_json(*, model: str, prompt: str) -> str per the LLMClient Protocol in src/eval/judge.py. Optionally add a complete(prompt: str) -> str for use as an EvalRunner.answer_fn.
  4. Lazy-import the SDK inside __init__ so the adapter module remains importable without the optional extra installed. The import error path should raise a clear, named exception (e.g. AzureOpenAIConfigError) telling the reader which uv sync --extra ... to run.
  5. Read configuration from environment variables at construction time. Raise the same named exception listing every missing var when env is incomplete — fail fast, fail clear.
  6. Add an offline unit test in tests/ that mocks the SDK at the sys.modules level (see tests/test_eval_azure_openai_adapter.py for the pattern). This keeps the unit suite credential-free; live-credential paths are exercised by eval/test_golden_patterns.py.
  7. Document the env contract in this README's table above and in docs/EVAL_HARNESS.md's "Worked patterns" section.

Why adapters live under src/eval/

The import-linter contract in pyproject.toml puts src.eval at the top of the layered import order:

api | eval -> agent -> tools -> data -> observability -> models

Adapters can therefore depend on anything in src/; nothing in src/ depends on them. That asymmetry is exactly what the layered architecture exists to encode — vendor-specific code stays at the boundary, never leaks down into the eval primitives or the model layer.