refactor: package-per-provider using uv workspaces and entry points; declarative distributions#5492
refactor: package-per-provider using uv workspaces and entry points; declarative distributions#5492eoinfennessy wants to merge 36 commits intoogx-ai:mainfrom
Conversation
…iscovery Set up a uv workspace and extract the Ollama inference provider into a separate package (efenness-llama-stack-provider-inference-ollama) to validate the package-per-provider approach proposed in llamastack#5478. Key changes: - Add [tool.uv.workspace] to root pyproject.toml with members for src/llama_stack_api and packages/* - Create packages/llama-stack-provider-inference-ollama with its own pyproject.toml, entry point registration, and provider source code - Add _discover_providers_from_entry_points() to distribution.py for entry-point-based provider discovery alongside existing registry files - Remove the original Ollama provider from src/llama_stack/providers/ and its registry entry, proving the entry-point path works standalone - Update distribution configs via codegen, tests, and the native messages module path list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
a2c190b to
924088e
Compare
|
This pull request has merge conflicts that must be resolved before it can be merged. @eoinfennessy please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
Add discover_entry_point_providers() to distribution.py and merge_entry_point_providers() utility in the registry package so that available_providers() includes entry-point-discovered providers alongside in-tree ones. This enables distribution config codegen to pick up providers that have been extracted into separate packages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
The ollama provider package dependencies are managed by its pyproject.toml and installed via uv sync as a workspace member. Listing the package name in pip_packages caused uv pip install (which is not workspace-aware) to resolve it from PyPI, pulling in a stale version of llama-stack that overwrote the editable install. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
The distro codegen pre-commit hook needs entry-point-migrated providers to be installed so their provider specs are discoverable. Change the hook to sync all workspace packages before running codegen, ensuring providers in packages/ are always available regardless of which dependency groups are active. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
…rashing In environments where an entry-point provider package is not installed (e.g. container builds), the provider type won't be in the registry. Log a warning and skip instead of raising ValueError, since the provider's own package manages its dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
… discovery Introduce a new pattern for building Llama Stack distributions as standalone Python packages. Distributions declare their providers as dependencies in pyproject.toml and register a CONFIGS_DIR via the llama_stack.distributions entry point group. Config generation discovers installed providers via entry points, builds a base config with default storage/server settings, and deep-merges YAML patches (with _base chaining) on top. New CLI commands: - llama stack config generate: generate a run config from the current environment - llama stack config show: print a resolved config to stdout Also extends config resolution to discover installed distribution packages, adds auto-selection of a sole installed distribution for llama stack run, and includes a demo distribution package using ollama inference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Move InferenceStore and parse_filter from providers/utils/ into core/routers/ where they belong, since core routers are the only consumers. Backward-compatible re-export shims remain at the old import paths for existing provider code. Remove 8 packages from core dependencies that are only used by providers: numpy, oci, oracledb, fire, prompt-toolkit, jsonschema, tornado, and urllib3. Move tornado and urllib3 security pins to constraint-dependencies where they apply transitively. Move tiktoken to the dev dependency group since it is only needed by provider tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
…ackages Move provider utilities from src/llama_stack/providers/utils/ into five dedicated workspace packages: llama-stack-utils-inference, llama-stack-utils-vector-io, llama-stack-utils-safety, llama-stack-utils-bedrock, and llama-stack-utils-common. Each package has its own pyproject.toml with only the dependencies it needs. The original file paths are preserved as sys.modules aliases so that all existing provider imports and unittest.mock.patch calls continue to work without modification. These shims will be removed as providers are extracted into their own packages in later phases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
…ackages Extract the 4 torch-dependent providers into their own packages under packages/, removing them from the in-tree registries and discovering them via entry points instead. This isolates heavy torch/transformers dependencies so they are only installed when the provider is needed. New packages: - llama-stack-provider-inference-sentence-transformers - llama-stack-provider-inference-transformers - llama-stack-provider-safety-prompt-guard - llama-stack-provider-tool-runtime-file-search Also adds merge_entry_point_providers() to the safety and tool_runtime registries (inference already had it) and fixes the list-deps test to use a provider that still has pip_packages declared. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
…ace packages Extract all 19 remaining remote inference providers from the in-tree registry into their own workspace packages under packages/. The inference registry is now empty - all providers are discovered via entry points. New packages: anthropic, azure, bedrock, cerebras, databricks, fireworks, gemini, groq, llama-cpp-server, llama-openai-compat, nvidia, oci, openai, passthrough, runpod, sambanova, together, vertexai, vllm, watsonx. Also regenerates distribution configs and provider docs to reflect the new module paths and entry-point-based provider ordering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
…ckages Extract all 13 vector I/O providers (faiss, sqlite-vec, chromadb, milvus, qdrant, pgvector, weaviate, elasticsearch, oci, infinispan) into individual workspace packages discovered via entry points. For chroma, milvus, and qdrant which share code between inline and remote variants, inline configs are moved into the remote package and the inline package depends on the remote package to avoid circular dependencies. The vector_io registry is now fully empty with all providers discovered via entry points. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
…lity packages The PoC ollama package was still using the efenness- prefix and importing from the old in-tree llama_stack.providers.utils paths. This renames it to llama-stack-provider-inference-ollama, adds llama-stack-utils-inference as a dependency, and rewrites imports to use the extracted utility packages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
…rs into separate workspace packages Extract 16 providers into individual workspace packages with entry-point discovery, completing Phase 6 of the package-per-provider refactoring: Safety (6): llama-guard, code-scanner, bedrock, nvidia, passthrough, sambanova Tool Runtime (5): brave-search, bing-search, tavily-search, wolfram-alpha, model-context-protocol Files (3): localfs, s3, openai File Processors (2): pypdf, docling All four registry files (safety, tool_runtime, files, file_processors) now use merge_entry_point_providers with an empty in-tree list, making them fully entry-point driven. Provider imports are rewritten to use the extracted utility packages (llama-stack-utils-inference, llama-stack-utils-safety, llama-stack-utils-bedrock, llama-stack-utils-common, llama-stack-utils-vector-io). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
…s into separate workspace packages Move the remaining providers with registry files into their own workspace packages with entry-point discovery. Registry files for batches, interactions, messages, and responses are now fully entry-point driven, completing the migration of all registry-based providers to the package-per-provider model. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Create 7 distribution packages (starter, ci-tests, nvidia, oci, open-benchmark, watsonx, postgres-demo) that register via llama_stack.distributions entry points and expose their config YAML files. This enables distributions to be installed independently and discovered at runtime via importlib.metadata. Update distro_codegen.py to sync generated configs into the distribution package directories alongside the in-tree copies. The in-tree distribution configs will eventually be removed once config generation moves to `llama stack config generate`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Move llama-stack-api from src/llama_stack_api/ to packages/llama-stack-api/src/llama_stack_api/ so it follows the same src-layout as all other workspace members under packages/*. Update all references in pre-commit hooks, GitHub Actions workflows, scripts, mypy config, grandfathered file paths, and conformance docs. The workspace members glob (packages/*) now covers this package without a separate entry. Switch the setuptools config from an explicit package/module list to packages.find with where=["src"]. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Move src/llama_stack/ to packages/llama-stack/src/llama_stack/ so the core package becomes a workspace member alongside llama-stack-api and all provider/distribution packages. The root pyproject.toml is now a workspace-only config file with no [project] section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be merged. @eoinfennessy please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
…ackages Standardize all 73 workspace packages to use consistent setuptools-scm versioning with fallback_version = "0.7.2.dev0" and reorder TOML sections to follow a canonical layout: [project] -> [project.*] -> [build-system] -> [tool.*]. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Update all three release workflows to handle ~73 workspace packages instead
of just llama-stack and llama-stack-api. pypi.yml now builds all packages in
a single job via `uv build --all-packages`. prepare-release.yml pins all
internal workspace dependencies to ==${VERSION} at release time.
post-release.yml bumps fallback_version across all workspace packages.
Also normalizes llama-stack package name from underscore to hyphen form.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Resolve conflicts: - providers-build.yml: keep workspace path, add upstream's PIP_CONSTRAINT_FILE arg - pypi.yml: take upstream's pypi-publish v1.14.0 bump, keep our publish structure Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
The root pyproject.toml is now a workspace config without a [project] table, so uv cannot build it as an editable wheel. Workspace packages are already available via uv sync. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
The llama-stack package moved from src/ to packages/llama-stack/src/, adding two extra directory levels. The REPO_ROOT calculations using Path(__file__).parent chains were too shallow, causing the integration test recording/replay system to look for recordings in the wrong directory and fail to find them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
The recording system was monkeypatching TavilySearchToolRuntimeImpl and PyPDFFileProcessorAdapter from in-tree paths, but the server loads these classes from the extracted workspace packages via entry points. Since they are different class objects, the patches had no effect and web search tests made unrecorded live calls that failed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
…project.toml The root pyproject.toml is now a workspace definition without a [project] table, so `uv pip install -e .` fails. Update all CI workflows and the Containerfile to install the core package at packages/llama-stack instead. Also sync the in-tree messages impl.py with cache metrics changes that landed on the extracted package during the main merge, and fix the backward-compat workflow to handle both old and new distribution paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Delete all duplicated provider source code from the monolith now that every extracted provider lives in its own package under packages/. This eliminates the dual-copy divergence problem where merging main would update one copy but not the other (e.g., the messages cache metrics bug). Key changes: - Delete ~53 provider directories from providers/inline/ and providers/remote/ - Delete providers/utils/ (all utils extracted to llama-stack-utils-* packages) - Move AssistantMessageWithReasoning to llama-stack-utils-inference - Update ~225 test imports to use extracted package paths - Update distribution config imports (starter, ci-tests, open-benchmark, etc.) - Fix _NATIVE_INTERACTIONS_MODULES to use extracted package path - Fix cross-package imports (CompactionConfig, forward_headers, connectors/mcp) Providers not yet extracted are left in-tree: agents, datasetio, eval, scoring, post_training, ios, inference/meta_reference, tool_runtime/rag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Remove 21 provider and utils packages from llama-stack dependencies that are no longer directly imported after deleting in-tree copies. Add 5 provider packages newly needed by distribution configs (files-localfs, file-processor-pypdf, safety-nvidia, tool-runtime-brave-search, tool-runtime-tavily-search). Also add llama-stack-client and qdrant-client to the unit test dependency group so all unit tests can run with --group unit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
The ruff configuration in llama-stack-api's pyproject.toml was a duplicate of the root config, left over from before the workspace migration. Remove it and move the F403 per-file-ignore for API __init__.py files to the root config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Already specified in the root pyproject.toml which covers the workspace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Dependencies are now resolved from the llama-stack-distribution-starter package's own pyproject.toml. The starter extra was a legacy flat list of ~50 third-party libraries predating the workspace structure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be merged. @eoinfennessy please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
Replace the Python-based distribution config generators (starter.py at 418 lines of Python and build.yaml at 56 lines of YAML, ci_tests.py at 120 lines of Python) with a declarative overlay system using structured YAML files validated by Pydantic. Overlays support four top-level keys with clear execution ordering: 1. base — chain to a parent overlay file 2. composition — pre-merge directives (e.g. storage: postgres) 3. patch — strategic merge data (providers merged by provider_type) 4. finalizers — post-merge transforms (e.g. conditional provider IDs) The starter distribution is now two overlay files (56 lines of YAML) and ci-tests extends it with a 55-line overlay that adds watsonx, test connectors, pre-registered models, and auth config. Generated configs are verified semantically identical to the originals via yq. Also removes the demo distribution package and in-tree starter and ci-tests distribution copies, as both are superseded by the overlay system and extracted packages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Convert nvidia, watsonx, oci, open-benchmark, and postgres-demo distributions to use the new overlay-based config generation system, replacing 1730 lines of Python config generators and static YAML with 128 lines of declarative overlay YAML. Add MergeStrategy enum and MergeDirectives model to ConfigOverlay schema, enabling per-API control over how provider lists are merged (strategic_merge vs replace). Remove the in-tree distributions/ directory entirely, along with the DistributionTemplate system and 25 provider dependencies from the core llama-stack package that were only needed by the old generators. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
…nto core Move mcp.py into llama_stack.core.mcp and TTLDict into llama_stack.core.utils.ttl_dict, eliminating a circular dependency between llama-stack and llama-stack-utils-common. Remove both llama-stack-utils-common and llama-stack-utils-inference as dependencies of the core llama-stack package. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
When providers were extracted to separate packages, their provider.py entry point files received placeholder one-liner descriptions instead of the rich multi-paragraph documentation from the original registry files. This caused provider_codegen.py to generate degraded docs missing Features, Usage, Installation, and Search Modes sections. Restores descriptions for 10 providers with rich content (faiss, sqlite-vec, chromadb inline/remote, qdrant inline, weaviate, milvus, pgvector, vertexai, azure) and fixes 4 minor wording differences (llama-guard, milvus-inline, qdrant remote, s3). Also removes unused BUILTIN_DEPS constant from inference registry and regenerates all provider documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
cdoern
left a comment
There was a problem hiding this comment.
initial review regarding our release workflow actions. Generally, there seems to have been a huge amount of change to prepare-release and pypi yml actions. Let's see which of these changes are actually necessary and only introduce those. These actions are very crucial and easy to break. Breaking these would halt any future releases until they are fixed so we need to be careful.
|
|
||
| # --- Update llama-stack-client pins in pyproject.toml --- | ||
| # Update the pin in [project.optional-dependencies] and [dependency-groups] | ||
| # Match patterns like: "llama-stack-client>=X.Y.Z" or "llama-stack-client==X.Y.Z" |
There was a problem hiding this comment.
lets try and preserve these comments if we can
| # Show what changed | ||
| echo "=== Changes ===" | ||
| git diff | ||
| git diff --stat |
| echo "version=${VERSION}" >> "$GITHUB_OUTPUT" | ||
| echo "Computed version: ${VERSION}" | ||
|
|
||
| # Build and validate release artifacts |
There was a problem hiding this comment.
what happened to the - name: Check if package should be built step?
There was a problem hiding this comment.
The new workflow uses uv build --all-packages to build everything in a single job, so the per-package check is no longer needed. Now workspace packages are always built together.
| id: client-ref | ||
| run: | | ||
| if [ -n "${{ inputs.client_ref }}" ] && [ "${{ inputs.client_ref }}" != "main" ]; then | ||
| # Explicit override from workflow_dispatch |
There was a problem hiding this comment.
please preserve these comments, all of the ones removed here.
| # === PYTHON SETUP (for all Python packages) === | ||
| # === PYTHON SETUP === | ||
| - name: Set up Python | ||
| if: steps.should-build.outputs.skip != 'true' && matrix.registry == 'pypi' |
| if [[ ! "$VERSION" =~ \.dev ]]; then | ||
| echo "Checking if ${PACKAGE} ${VERSION} exists on PyPI..." | ||
|
|
||
| # Parse version components |
| done | ||
|
|
||
| if [ "$VERSION" != "$BASE_VERSION" ]; then | ||
| echo "::notice::Bumped ${MATRIX_PACKAGE} (PyPI: ${PACKAGE}) version from ${BASE_VERSION} to ${VERSION} (already existed on PyPI)" |
There was a problem hiding this comment.
preserve this if else logic
| ls -la dist/ | ||
|
|
||
| - name: Upload artifacts (local) | ||
| if: steps.should-build.outputs.skip != 'true' && matrix.type == 'local' |
There was a problem hiding this comment.
it looks like we removed the differentiation between an external and local build (the clients are external). why?
There was a problem hiding this comment.
External and local builds have been split from a single build-package into two jobs: build-local-packages and build-external-package
| # Order: llama-stack-client-python, llama-stack-client-typescript, llama-stack-api, llama-stack | ||
| publish-packages: | ||
| name: Publish ${{ matrix.package }} | ||
| # Publish all local workspace packages to PyPI in a single job |
There was a problem hiding this comment.
what is the point in this change?
…vider filtering Remove the distribution-package-based provider filtering logic from the config generator. Instead, rely on the isolated venv created by distro_generate_config.sh to ensure only the correct providers are discovered via entry points. This eliminates the need to inspect distribution dependencies at generation time. Also update distro_generate_config.sh to create a proper temporary venv instead of using uv run --project, and add --distribution flag support for environments with multiple distributions installed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Add a new lightweight distribution package containing only the OpenAI inference provider, demonstrating the minimum viable distribution for the package-per-provider architecture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Eoin Fennessy <efenness@redhat.com>
Summary
Proof-of-concept for #5478. Extracts the Ollama inference provider into a separate workspace package with entry-point-based discovery (924088e), and introduces a declarative distribution pattern built on top of it (ce99839). Commits after this (9762790 onwards) mock up a phased implementation of a full migration.
Provider extraction
See commit 924088e, which introduces uv workspaces, Python entry points, and moves the
remote::ollamaprovider to its own package in the workspace.[tool.uv.workspace]to rootpyproject.tomlpackages/llama-stack-provider-inference-ollama/as a workspace member with its ownpyproject.toml, dependencies, and entry point registration_discover_providers_from_entry_points()toget_provider_registry()— entry-point providers supplement registry-file providerspip_packagesDeclarative distributions
See commit ce99839 for the example implementation.
pyproject.tomldeclaring core + provider dependencies, aCONFIGS_DIRPath exported via thellama_stack.distributionsentry point group, and optional YAML patches deep-merged onto an auto-generated base configllama stack config generate) discovers installed providers via entry points and builds a complete run config — no Python-as-config, no codegen scripts_baseallows shared base patches (e.g. server settings) across variantsllama stack config show,llama stack run) discovers installed distribution packages via entry points, withdistro::variantsyntax for multiple configsllama stack runauto-selects the sole installed distribution when no config is specifiedpackages/llama-stack-distribution-demo/as a working example using ollama inferencescripts/distro_generate_config.shfor in-tree developers to generate configs in an isolateduv run --projectenvironmentValidated
uv lockanduv syncresolve the workspace correctlyinference.remote.ollamais discoverable viaimportlib.metadataget_provider_registry()discovers the provider via entry point when the registry file entry is absentuv buildproduces a valid sdist + wheel with lockstep versioning viasetuptools-scmllama stack config generateproduces correct config from entry-point providers + patchesllama stack config show demoresolves and prints config from installed distribution packageTest plan
uv lock && uv syncsucceedsuv run python -c "from importlib.metadata import entry_points; print(entry_points(group='llama_stack.providers'))"returns the Ollama entry pointuv run python -m pytest tests/unit/distribution/ tests/unit/providers/utils/ --tb=short— 290 passeduv build --package efenness-llama-stack-provider-inference-ollamaproduces valid wheel./scripts/distro_generate_config.sh packages/llama-stack-distribution-demo --patch packages/llama-stack-distribution-demo/patches/config.yamlgenerates correct configuv run llama stack config show demoresolves installed distribution and prints config🤖 Generated with Claude Code