Concrete LLMClient adapters for the eval harness. The judge in src/eval/judge.py calls an LLMClient Protocol — never a vendor SDK directly. Each adapter in this package implements that Protocol for one provider, so the eval core stays vendor-neutral and a downstream consumer can swap providers by changing one wiring line in their test fixture.
Exported from this package:
AzureOpenAIClient— implementssrc.eval.judge.LLMClient. Construct from env viaAzureOpenAIClient(); callcomplete(prompt)for runneranswer_fnuse,complete_json(*, model, prompt)for judge use. Themodelargument oncomplete_jsonis accepted for Protocol conformance and discarded — Azure addresses by deployment name (set at construction time, read fromAZURE_OPENAI_DEPLOYMENT).AzureOpenAIConfigError— raised at construction when required env is missing or the optionalopenaiextra is not installed. Subclass ofRuntimeError. The error message names every missing env var in one go so the caller doesn't have to fix-and-retry.
Without the Protocol seam, swapping LLM providers would mean touching the eval core. With it, vendor lock-in is confined to one file per provider. The layer demonstrates that the harness's "provider-agnostic" claim is structural, not aspirational: the eval core has zero imports of any vendor SDK.
| File | Provider | Optional extra | Env contract |
|---|---|---|---|
azure_openai.py |
Azure OpenAI | uv sync --extra eval |
AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, AZURE_OPENAI_DEPLOYMENT, optional AZURE_OPENAI_API_VERSION (default 2024-10-21) |
- Add the SDK to
[project.optional-dependencies]inpyproject.toml— either to the existingevalextra or a new provider-scoped one. - Add the SDK's top-level module to
[[tool.mypy.overrides]]withignore_missing_imports = true, matching the existingopenai.*/opentelemetry.*entries. This keeps mypy clean on stockuv sync --extra devcheckouts. - Implement
complete_json(*, model: str, prompt: str) -> strper theLLMClientProtocol insrc/eval/judge.py. Optionally add acomplete(prompt: str) -> strfor use as anEvalRunner.answer_fn. - Lazy-import the SDK inside
__init__so the adapter module remains importable without the optional extra installed. The import error path should raise a clear, named exception (e.g.AzureOpenAIConfigError) telling the reader whichuv sync --extra ...to run. - Read configuration from environment variables at construction time. Raise the same named exception listing every missing var when env is incomplete — fail fast, fail clear.
- Add an offline unit test in
tests/that mocks the SDK at thesys.moduleslevel (seetests/test_eval_azure_openai_adapter.pyfor the pattern). This keeps the unit suite credential-free; live-credential paths are exercised byeval/test_golden_patterns.py. - Document the env contract in this README's table above and in
docs/EVAL_HARNESS.md's "Worked patterns" section.
The import-linter contract in pyproject.toml puts src.eval at the top of the layered import order:
api | eval -> agent -> tools -> data -> observability -> models
Adapters can therefore depend on anything in src/; nothing in src/ depends on them. That asymmetry is exactly what the layered architecture exists to encode — vendor-specific code stays at the boundary, never leaks down into the eval primitives or the model layer.