Skip to main content

Quickstart

Start the server

Make sure you have uv installed, then:

Install Ollama, then pull a model and start the server:

ollama pull llama3.2:3b
export OLLAMA_URL=http://localhost:11434/v1
uvx --from 'llama-stack[starter]' llama stack run starter
Project setup

The uvx command above is great for trying things out. For a real project, install into a persistent environment:

uv init my-ai-app && cd my-ai-app
uv add 'llama-stack[starter]' openai
export OLLAMA_URL=http://localhost:11434/v1
uv run llama stack run starter

The server is now running at http://localhost:8321. You can use any OpenAI-compatible client.

Verify it works

Before writing any code, confirm the server is healthy and models are registered:

curl -s http://localhost:8321/v1/models | python -m json.tool

You should see output listing available models, for example:

{
"data": [
{
"id": "ollama/llama3.2:3b",
"object": "model",
"owned_by": "llama_stack",
...
}
]
}

If the list is empty or the command fails, check the Troubleshooting section below.

Try it out

Open a new terminal and run:

app.py
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")

response = client.responses.create(
model="ollama/llama3.2:3b",
input="What is Llama Stack?",
)
print(response.output_text)
pip install openai && python app.py

Add RAG in 10 lines

Upload a file, create a vector store, and ask questions about it:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")

# Upload a document
file = client.files.create(
file=open("my-document.pdf", "rb"),
purpose="assistants",
)

# Create a vector store and index the file
vector_store = client.vector_stores.create(
name="my-docs",
file_ids=[file.id],
)

# Ask questions with file search
response = client.responses.create(
model="ollama/llama3.2:3b",
input="Summarize the key points",
tools=[{
"type": "file_search",
"vector_store_ids": [vector_store.id],
}],
)
print(response.output_text)

That's it. Same OpenAI SDK, local model, your own vector store.

Troubleshooting

Port already in use

If you see Address already in use, another process is using port 8321. Either stop it or run on a different port:

uvx --from 'llama-stack[starter]' llama stack run starter --port 8322
Connection refused

If curl returns Connection refused, the server has not finished starting. Wait a few seconds for model registration to complete, then try again. Check the terminal where you started the server for errors.

Model not found

If the API returns an error about a model not being found, make sure you pulled the model first. For Ollama:

ollama pull llama3.2:3b

Then restart the Llama Stack server. You can verify available models with curl http://localhost:8321/v1/models.

What's next?