Entity-First Architecture for Python — define business entities, declare relationships, let the framework assemble your data.
Requirements: Python 3.10+, Pydantic v2
Most FastAPI projects follow the same pattern: define SQLAlchemy ORM models first, then create Pydantic schemas that mirror them. This "ORM-First" approach is so common that many developers have never questioned it. But as projects grow, it creates systemic problems:
| # | Problem | Symptom |
|---|---|---|
| 1 | Schema passively follows ORM | Same fields defined twice; API contract tied to DB design |
| 2 | Business concepts lost | Frontend sees owner_id instead of "task has an owner" |
| 3 | Data assembly has no home | Join logic scatters across Repository / Service / Route |
| 4 | Multi-source data is hard | Each new data source means new conversion code everywhere |
| 5 | Schema reuse is hard | Copy-paste for UserSummary / UserDetail / UserAvatar |
These are not individual tooling issues. They all stem from one root cause: the absence of an independent business entity layer between the database and the API.
# The data assembly dilemma: where does this logic go?
@router.get("/tasks")
async def get_tasks():
tasks = await task_service.get_tasks()
# Collect IDs, batch query, build mapping, assemble result...
user_ids = list({t.owner_id for t in tasks})
users = await user_service.get_users_by_ids(user_ids)
user_map = {u.id: u for u in users}
result = []
for task in tasks:
task_dict = task.model_dump()
task_dict['owner'] = user_map.get(task.owner_id)
result.append(TaskResponse(**task_dict))
return resultWhether this code lives in Repository, Service, or Route, the problem is the same: data assembly logic has no proper place in traditional three-layer architecture.
pydantic-resolve provides the missing layer. It implements Entity-First Architecture, which maps naturally to Clean Architecture:
| Clean Architecture Layer | pydantic-resolve Component |
|---|---|
| Enterprise Business Rules | Entity + ER Diagram |
| Application Business Rules | Resolver + resolve/post |
| Interface Adapters | Loader (data access) |
| Frameworks & Interfaces | Response + FastAPI routes |
For the full analysis with code examples and migration guidance, see Entity-First Architecture.
pydantic-resolve provides three mechanisms:
| What you need | What you write | What the framework does |
|---|---|---|
| Load related data | resolve_* + Loader(...) |
Batch lookups and map results back |
| Compute derived fields | post_* |
Run after descendants are fully resolved |
| Reuse relationship declarations | ER Diagram + AutoLoad |
Centralize relationship wiring for many models |
The same ERD also powers GraphQL queries, MCP services, and admin tools:
flowchart TB
entity["Entity + ERD<br/>Business model & relationships"]
resolve["Resolver<br/>resolve / post / expose / collector"]
graphql["GraphQL Generator"]
usecase["UseCase Service<br/>call_use_case / selection"]
api["REST API"]
mcp1["MCP Service<br/>UseCase-driven"]
mcp2["MCP Service<br/>Schema-driven"]
entity --> resolve
entity --> graphql
resolve --> usecase
usecase --> api
usecase --> mcp1
graphql --> mcp2
If you just need to fix an N+1 problem on one endpoint, skip to Quick Start.
# Before: manual N+1 assembly in your route
@router.get("/tasks")
async def get_tasks():
tasks = await task_service.get_tasks()
user_ids = list({t.owner_id for t in tasks})
users = await user_service.get_users_by_ids(user_ids)
user_map = {u.id: u for u in users}
return [
TaskResponse(**{**t.model_dump(), 'owner': user_map.get(t.owner_id)})
for t in tasks
]# After: declare what's missing, let the framework assemble
class TaskView(BaseModel):
id: int
title: str
owner_id: int
owner: Optional[UserView] = None
def resolve_owner(self, loader=Loader(user_loader)):
return loader.load(self.owner_id)
@router.get("/tasks")
async def get_tasks():
tasks = [TaskView.model_validate(t) for t in await task_repo.get_tasks()]
return await Resolver().resolve(tasks)pip install pydantic-resolve
pip install pydantic-resolve[mcp] # with MCP supportThroughout the Quick Start, we build one API:
Sprinthas manyTaskTaskhas oneowner(aUser)- The API also needs derived fields like
task_countandcontributors
Each step adds one concept on top of the previous code.
Every response model has some fields already filled (from the database, from user input) and some fields that need to be fetched separately. resolve_* is how you declare those missing fields.
Start with the simplest case: each task has an owner_id, and you want an owner object on the response.
from typing import Optional
from pydantic import BaseModel
from pydantic_resolve import Loader, Resolver, build_object
class UserView(BaseModel):
id: int
name: str
async def user_loader(user_ids: list[int]):
users = await db.query(User).filter(User.id.in_(user_ids)).all()
return build_object(users, user_ids, lambda user: user.id)
class TaskView(BaseModel):
id: int
title: str
owner_id: int
owner: Optional[UserView] = None
def resolve_owner(self, loader=Loader(user_loader)):
return loader.load(self.owner_id)
tasks = [TaskView.model_validate(task) for task in raw_tasks]
tasks = await Resolver().resolve(tasks)That is the core idea of the library:
owneris missing data, so you describe how to fetch it.user_loaderreceives all requestedowner_idvalues together.Resolver().resolve(...)walks the model tree and fills the field.
A useful mental model is: resolve_* means "this field needs data from outside the current node."
Real APIs rarely have just one relationship. When Sprint contains many Tasks, and each Task already knows how to load its owner, the resolver walks the tree and batch-loads everything recursively.
from typing import List
from pydantic_resolve import build_list
async def task_loader(sprint_ids: list[int]):
tasks = await db.query(Task).filter(Task.sprint_id.in_(sprint_ids)).all()
return build_list(tasks, sprint_ids, lambda task: task.sprint_id)
class SprintView(BaseModel):
id: int
name: str
tasks: List[TaskView] = []
def resolve_tasks(self, loader=Loader(task_loader)):
return loader.load(self.id)
sprints = [SprintView.model_validate(sprint) for sprint in raw_sprints]
sprints = await Resolver().resolve(sprints)Result: one query per loader, regardless of how many sprints or tasks you load.
This is why resolve_* is the best place to start. You can get value from the library before learning any advanced features.
task_count and contributor_names don't come from a query — they're derived from data already on the model. post_* handles these: it runs after all nested resolve_* calls have finished.
class SprintView(BaseModel):
id: int
name: str
tasks: List[TaskView] = []
task_count: int = 0
contributor_names: list[str] = []
def resolve_tasks(self, loader=Loader(task_loader)):
return loader.load(self.id)
def post_task_count(self):
return len(self.tasks)
def post_contributor_names(self):
return sorted({task.owner.name for task in self.tasks if task.owner})Execution order:
resolve_tasksloads the sprint's tasks.- Each
TaskView.resolve_ownerloads its owner. post_task_countandpost_contributor_namesrun after those nested fields are ready.
resolve_* |
post_* |
|
|---|---|---|
| Needs external IO? | Yes | Usually no |
| Runs before descendants ready? | Yes | No |
| Good for counts, sums, formatting? | Sometimes | Yes |
| Return value resolved again? | Yes | No |
These two patterns cover most API endpoints. The next section covers cross-layer data flow — skip to ER Diagram if you don't need it yet.
When parent and child nodes need to share data without hard-coding references to each other:
ExposeAs: send ancestor data downwardSendTo+Collector: send child data upward
from typing import Annotated
from pydantic_resolve import Collector, ExposeAs, SendTo
class SprintView(BaseModel):
id: int
name: Annotated[str, ExposeAs('sprint_name')]
tasks: List[TaskView] = []
contributors: list[UserView] = []
def resolve_tasks(self, loader=Loader(task_loader)):
return loader.load(self.id)
def post_contributors(self, collector=Collector('contributors')):
return collector.values()
class TaskView(BaseModel):
id: int
title: str
owner_id: int
owner: Annotated[Optional[UserView], SendTo('contributors')] = None
full_title: str = ""
def resolve_owner(self, loader=Loader(user_loader)):
return loader.load(self.owner_id)
def post_full_title(self, ancestor_context):
return f"{ancestor_context['sprint_name']} / {self.title}"Use this when the shape of the tree matters — for example, a child needs ancestor context (sprint name, permissions), or a parent needs to aggregate values from many descendants (all contributors, all tags).
ER Diagram + AutoLoad is where Entity-First Architecture fully crystallizes: relationships become the stable core, and every Response is just a different view of the same Entity graph.
Up to this point, the Core API is enough. Stay there until relationship declarations start repeating across many response models.
A common signal is when you see the same relation described again and again:
TaskCard.resolve_ownerTaskDetail.resolve_ownerSprintBoard.resolve_tasksSprintReport.resolve_tasks
At that point, the problem is no longer "how do I load this field?" but "where is the source of truth for relationships?"
| Question | Hand-written Core API | ER Diagram + AutoLoad |
|---|---|---|
| First endpoint | Faster | Slower |
| Upfront setup | Low | Medium |
| Reusing the same relation in many models | Repetitive | Centralized |
| Changing a relationship later | Update many resolve_* methods |
Update one ERD declaration |
| GraphQL / MCP generation | Separate work | Natural extension |
ERD mode asks for more discipline up front:
- Define entity classes.
- Declare relationships explicitly.
- Create
AutoLoadfrom the samediagramused by the resolver.
That setup cost is real. The payoff is that relationship knowledge moves into one place.
Here is the same Sprint -> Task -> User example after moving relationship wiring into an ER Diagram:
from typing import Annotated, Optional
from pydantic import BaseModel
from pydantic_resolve import Relationship, base_entity, config_global_resolver
BaseEntity = base_entity()
class UserEntity(BaseModel, BaseEntity):
id: int
name: str
class TaskEntity(BaseModel, BaseEntity):
__relationships__ = [
Relationship(fk='owner_id', name='owner', target=UserEntity, loader=user_loader)
]
id: int
title: str
owner_id: int
class SprintEntity(BaseModel, BaseEntity):
__relationships__ = [
Relationship(fk='id', name='tasks', target=list[TaskEntity], loader=task_loader)
]
id: int
name: str
diagram = BaseEntity.get_diagram()
AutoLoad = diagram.create_auto_load()
config_global_resolver(diagram)
class TaskView(TaskEntity):
owner: Annotated[Optional[UserEntity], AutoLoad()] = None
class SprintView(SprintEntity):
tasks: Annotated[list[TaskView], AutoLoad()] = []
task_count: int = 0
def post_task_count(self):
return len(self.tasks)Compared with the Core API version:
resolve_ownerdisappears.resolve_tasksdisappears.- The relationship definitions live in one place.
post_*still works exactly the same.
If you want to hide internal FK fields such as owner_id, add DefineSubset on top of the ERD setup:
from pydantic_resolve import DefineSubset
class TaskSummary(DefineSubset):
__subset__ = (TaskEntity, ('id', 'title'))
owner: Annotated[Optional[UserEntity], AutoLoad()] = NoneOnce ERD mode makes sense conceptually, you can let the ORM describe the relationships for you and import them into the application-layer ERD.
from pydantic_resolve import ErDiagram
from pydantic_resolve.integration.mapping import Mapping
from pydantic_resolve.integration.sqlalchemy import build_relationship
entities = build_relationship(
mappings=[
Mapping(entity=SprintEntity, orm=SprintORM),
Mapping(entity=TaskEntity, orm=TaskORM),
Mapping(entity=UserEntity, orm=UserORM),
],
session_factory=session_factory,
)
diagram = ErDiagram(entities=[]).add_relationship(entities)
AutoLoad = diagram.create_auto_load()
config_global_resolver(diagram)build_relationship supports SQLAlchemy, Django, and Tortoise ORM. This is a good later optimization when your ORM metadata is already stable and you want to avoid duplicating relationship declarations.
- Start with hand-written
resolve_*andpost_*on one endpoint. - Move repeated relations into ERD when multiple models need the same wiring.
- Let
build_relationship()read ORM metadata when the ORM is already the source of truth.
ERD mode is a good fit when:
- The project has 3+ related entities reused across multiple response models.
- The team wants one shared place to inspect and discuss relationships.
- You want GraphQL or MCP generated from the same model graph.
- You want to hide FK fields while keeping relationship definitions centralized.
Core API is usually enough when:
- You only have a few loading requirements.
- You want each endpoint to stay maximally explicit.
- The response shape is still changing quickly.
The same ERD that drives REST APIs also powers GraphQL queries, MCP services, and admin tools.
Generate GraphQL schema from ERD and execute queries:
from pydantic_resolve.graphql import GraphQLHandler
handler = GraphQLHandler(diagram)
result = await handler.execute("{ users { id name posts { title } } }")
# result.data == {"users": [{"id": 1, "name": "Alice", "posts": [{"title": "Hello"}]}, ...]}Expose GraphQL APIs to AI agents (requires pip install pydantic-resolve[mcp]):
from pydantic_resolve import AppConfig, create_mcp_server
mcp = create_mcp_server(apps=[AppConfig(name="blog", er_diagram=diagram)])
mcp.run()
# Agents can then query: "list all posts by user Alice" → translated to GraphQL against your ERDInteractive ERD exploration with fastapi-voyager:
from fastapi_voyager import create_voyager
app.mount('/voyager', create_voyager(app, er_diagram=diagram))- Loader returns
None:resolve_*fields stay at their default value. No error is raised. - Circular dependencies: The resolver detects cycles and raises
ResolverErrorat resolve time. - Large datasets: Each loader collects all keys in one batch query. For very large key sets (10k+), consider pagination before resolving.
- Async only:
Resolver().resolve()is async; there is no synchronous API.
| Dimension | ORM-First | Entity-First |
|---|---|---|
| Type source of truth | ORM model | Entity (Pydantic) |
| Relationship wiring | Repeated per endpoint | Centralized in ERD |
| Data assembly | Manual in Service/Route | Automatic via Resolver |
| N+1 prevention | Manual eager loading | Built-in DataLoader batching |
| Multi-data source | Scattered conversion code | Unified Loader interface |
| API contract stability | Tied to DB schema | Independent of DB |
| Feature | GraphQL | pydantic-resolve |
|---|---|---|
| N+1 Prevention | Manual DataLoader setup | Built-in automatic batching |
| Type Safety | Separate schema files | Native Pydantic types |
| Learning Curve | Steep (Schema, Resolvers, Loaders) | Moderate (Loader/batch pattern required) |
| Debugging | Complex introspection | Standard Python debugging |
| Integration | Requires dedicated server | Works with any framework |
| Query Flexibility | Any client can query anything | Explicit API contracts |
Note: pydantic-resolve borrows the DataLoader batch pattern from GraphQL ecosystems. The main difference is that you keep your existing REST framework and get automatic batching without adopting a full GraphQL server. If your project already uses strawberry or ariadne and is happy with it, pydantic-resolve may be redundant.
- Full Documentation
- Entity-First Architecture (full paper)
- Example Project
- Live Demo
- Live Demo - GraphQL
- API Reference
MIT License
tangkikodo (allmonday@126.com)