feat: add Apache AGE as an alternative graph backend#1326
Conversation
Add Apache AGE (openCypher over PostgreSQL) as a third VectorGraphStore
implementation alongside Neo4j and NebulaGraph. MemMachine already
requires Postgres + pgvector for semantic memory; AGE ships ``age`` and
``vector`` in one Postgres instance, letting episodic + semantic memory
share a single database service. NebulaGraph already covers Apache-2.0
licensing for users at scale, but bundling it means running a 3-role
distributed system; AGE is the natural fit for single-instance and
small-cluster deployments that want to drop Neo4j without adopting a
new product.
Backend implementation:
- AgeVectorGraphStore + age_utils: full VectorGraphStore against AGE +
pgvector. Graph structure lives in AGE; per-embedding similarity goes
to pgvector side tables keyed by vertex uid, tracked via a per-graph
registry table. Supports pgvector's full 1..16000 dimension range
rather than inheriting Neo4j's tighter cap. Asserts AGE >= 1.6.0 on
first use so older versions fail with a clear message instead of
"SET clause expects a map".
- AgeConf + DatabaseManager plumbing mirroring Neo4jConf. The connect-
event hook is extracted to a named ``_register_age_connect_hook`` so
tests patch a stable seam instead of SQLAlchemy internals.
- /resources status endpoint surfaces age (and, while touching that
code, the pre-existing nebula_graph gap).
Deployment (Docker Compose):
- deployments/docker/postgres-age/Dockerfile layers postgresql-16-
pgvector on apache/age:release_PG16_1.6.0. Published as
memmachine/postgres-age:pg16-1.6.0 via a new build-postgres-age job
in .github/workflows/docker-image.yml.
- docker-compose.age.yml: self-contained single-Postgres AGE stack
(``docker compose -f docker-compose.age.yml up``). The base
docker-compose.yml keeps postgres-age under an opt-in ``age`` profile
for the "run AGE alongside a pip-installed server" flow.
Deployment (Helm, chart bumped 0.1.0 -> 0.2.0):
- New top-level ``episodicBackend: neo4j | age`` toggle, orthogonal to
the per-backend ``*.enabled`` flags (which keep their meaning of
"deploy in-cluster vs use external host"). Validation guard fails
cleanly on an invalid value.
- In AGE mode, ``postgres.enabled`` is ignored: the standalone pgvector
Deployment/Service/PVC are skipped, and db_postgres + db_age both
target the AGE-enabled Postgres (which ships vector too). Single
database service for session_manager, episode_store, semantic memory,
config_database, and the graph store. The ``wait-for-postgres`` init
container and ``POSTGRES_PASSWORD`` env-var source follow the active
backend.
- postgres-age-secret renames its key from POSTGRES_AGE_PASSWORD to
POSTGRES_PASSWORD so the MemMachine Deployment sources the same env
var name from either postgres-secret or postgres-age-secret.
- postgresAge.* values expose SQLAlchemy pool knobs (pool_size,
max_overflow, pool_timeout, pool_recycle, pool_pre_ping) for parity
with postgres.* in Neo4j mode.
Installation:
- memmachine-configure prompts for graph backend choice and routes
accordingly: Neo4j mode keeps the existing auto-install flow; AGE
mode falls through to the wizard's interactive connection prompts
(users stand up their own AGE-enabled Postgres via
docker-compose.age.yml, helm, or existing infra).
- Configuration wizard emits a paired AgeConf + SqlAlchemyConf at the
same Postgres so semantic memory's pgvector store rides on the AGE
database. AGE connection defaults consolidated into an
``AgeDefaults`` dataclass.
- Fixes a pre-existing bug where semantic_memory.database in generated
configs pointed at NEO4J_DB_ID (a graph backend, not a relational
one). Now left unset in Neo4j mode so SemanticMemoryConf's existing
auto-disable validator fires cleanly, and targets the co-located
Postgres in AGE mode.
Tests:
- Unit: test_age_utils (agtype round-trip, cypher wrapping, sanitizer,
parameter encoding); AgeConf lifecycle tests in test_database_manager
mirror the Neo4jConf ones; wizard tests cover AGE interactive mode
and invalid-backend rejection.
- Integration: test_age_vector_graph_store exercises the store end-to-
end against a testcontainer built from the shipped postgres-age
Dockerfile, so the test environment matches prod.
Docs: README, USAGE, DOCKER_COMPOSE_README, and the helm README updated
with AGE sections; sample_configs/episodic_memory_config.age.sample
ships a complete single-Postgres config example.
Notes:
- AGE 1.6.0 rejects ``SET n += $param`` inside cypher() ("SET clause
expects a map") even when the parameter resolves to a map; aliasing
via ``WITH $param AS p`` before SET sidesteps it.
- Side-table names are SHA1-hashed because Postgres' 63-byte identifier
limit cannot fit the full (graph, kind, collection, embedding_name)
tuple; the registry table preserves the reverse mapping for cleanup
and hydration.
- The docker-image.yml postgres-age build targets linux/amd64 only;
apache/age has had inconsistent arm64 availability historically.
Flip to multi-arch once confirmed upstream.
There was a problem hiding this comment.
Pull request overview
Adds Apache AGE (openCypher over PostgreSQL) as an additional VectorGraphStore backend, enabling a “single-Postgres” deployment where episodic (graph) and semantic (pgvector) storage can share one database service, plus corresponding installer and deployment support.
Changes:
- Introduce
AgeVectorGraphStoreandage_utils, plusAgeConf/DatabaseManagerplumbing to configure and run AGE as a graph backend. - Add deployment artifacts for AGE (Dockerfile + Compose + Helm chart toggles/templates) and CI publishing for a
postgres-ageimage. - Update installer/wizard flows, sample configs, docs, and tests (unit + integration) to cover AGE and the “single-Postgres” path.
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| sample_configs/episodic_memory_config.age.sample | New sample configuration for an AGE-backed stack. |
| packages/server/src/memmachine_server/server/api_v2/config_service.py | Surface AGE (and NebulaGraph) DB configs in /resources status output. |
| packages/server/src/memmachine_server/installation/utilities.py | Add default prompt values for AGE connection details. |
| packages/server/src/memmachine_server/installation/memmachine_configure.py | Add backend selection prompt and route installer flow based on backend. |
| packages/server/src/memmachine_server/installation/configuration_wizard.py | Add AGE wizard mode and emit paired AgeConf + SqlAlchemyConf for single-Postgres deployments. |
| packages/server/src/memmachine_server/common/vector_graph_store/age_vector_graph_store.py | New AGE-backed VectorGraphStore implementation using AGE + pgvector side tables. |
| packages/server/src/memmachine_server/common/resource_manager/database_manager.py | Add AGE engine lifecycle, validation, and graph store construction. |
| packages/server/src/memmachine_server/common/configuration/database_conf.py | Add AgeConf, SupportedDB.AGE, and DatabasesConf.age_confs parsing/serialization. |
| packages/server/src/memmachine_server/common/age_utils.py | New helpers for agtype parsing, Cypher wrapping, identifier sanitization, and per-connection AGE setup. |
| packages/server/server_tests/memmachine_server/installation/test_memmachine_configure.py | Update installer tests for the new backend selection prompt. |
| packages/server/server_tests/memmachine_server/installation/test_configuration_wizard.py | Add wizard tests for AGE mode and invalid backend values. |
| packages/server/server_tests/memmachine_server/common/vector_graph_store/test_age_vector_graph_store.py | New Docker-backed integration tests for AgeVectorGraphStore. |
| packages/server/server_tests/memmachine_server/common/test_age_utils.py | New unit tests for age_utils pure helpers. |
| packages/server/server_tests/memmachine_server/common/resource_manager/test_database_manager.py | Add unit tests for AGE engine/store construction and pool kwarg forwarding. |
| docker-compose.yml | Add opt-in postgres-age service under an age profile. |
| docker-compose.age.yml | New self-contained Compose stack running MemMachine against AGE. |
| deployments/helm/values.yaml | Add episodicBackend toggle and postgresAge values to support AGE deployments. |
| deployments/helm/templates/secrets.yaml | Add postgres-age-secret for AGE Postgres password wiring. |
| deployments/helm/templates/pvc.yaml | Skip standalone Postgres PVC in AGE mode; add PVC for postgres-age when enabled. |
| deployments/helm/templates/postgres-service.yaml | Skip standalone Postgres service in AGE mode. |
| deployments/helm/templates/postgres-deployment.yaml | Skip standalone Postgres deployment in AGE mode; document behavior. |
| deployments/helm/templates/postgres-age-service.yaml | New service for in-cluster postgres-age. |
| deployments/helm/templates/postgres-age-deployment.yaml | New deployment for in-cluster postgres-age. |
| deployments/helm/templates/memmachine-deployment.yaml | Adjust init containers and secrets/env wiring for AGE vs Neo4j mode. |
| deployments/helm/templates/memmachine-configmaps.yaml | Validate episodicBackend and emit AGE vs Neo4j database resources accordingly. |
| deployments/helm/README.md | Document backend selection semantics and AGE single-Postgres mode. |
| deployments/helm/Chart.yaml | Bump chart version and update description. |
| deployments/docker/postgres-age/Dockerfile | New image layering pgvector onto the apache/age base image. |
| USAGE.md | Update docs to reflect multiple episodic graph backends. |
| README.md | Update storage backend description to include AGE. |
| DOCKER_COMPOSE_README.md | Document AGE compose options and configuration shape. |
| .github/workflows/docker-image.yml | Add CI job to build/push the postgres-age companion image. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| async def _label_exists(self, sanitized_label: str) -> bool: | ||
| """Return True if the AGE label has a backing table in the graph schema. | ||
|
|
||
| Reads use this to bail out early with empty results when a label has | ||
| never been written to — recent AGE releases raise "label does not | ||
| exist" on ``MATCH (n:Label)`` for unknown labels, so we cannot rely on | ||
| Cypher returning zero rows. | ||
| """ | ||
| await self._ensure_graph_initialized() | ||
| async with self._engine.connect() as conn: | ||
| result = await conn.exec_driver_sql( | ||
| "SELECT 1 FROM pg_tables WHERE schemaname = $1 AND tablename = $2", | ||
| (self._graph_name, sanitized_label), | ||
| ) | ||
| return result.first() is not None | ||
|
|
||
| async def _ensure_label(self, sanitized_label: str, *, kind: str) -> None: | ||
| """Create an AGE vertex or edge label if it does not already exist. | ||
|
|
||
| AGE creates each label as a table in the graph's schema, so we detect | ||
| existence via ``pg_tables`` rather than relying on ``ag_catalog`` | ||
| internals (which vary across AGE versions). The actual creation uses | ||
| AGE's built-in ``create_vlabel`` / ``create_elabel`` helpers. | ||
| """ | ||
| entity_type = EntityType.NODE if kind == "v" else EntityType.EDGE | ||
| key = (entity_type, sanitized_label) | ||
| if key in self._ensured_labels: | ||
| return | ||
| async with self._ensured_labels_lock: | ||
| if key in self._ensured_labels: | ||
| return | ||
| await self._ensure_graph_initialized() | ||
| async with self._engine.begin() as conn: | ||
| exists = ( | ||
| await conn.exec_driver_sql( | ||
| "SELECT 1 FROM pg_tables " | ||
| "WHERE schemaname = $1 AND tablename = $2", | ||
| (self._graph_name, sanitized_label), | ||
| ) | ||
| ).first() | ||
| if exists is None: | ||
| creator = "create_vlabel" if kind == "v" else "create_elabel" | ||
| await conn.exec_driver_sql( | ||
| f"SELECT {creator}($1, $2)", | ||
| (self._graph_name, sanitized_label), |
There was a problem hiding this comment.
AGE label existence is detected via pg_tables(tablename = sanitized_label). PostgreSQL identifiers are limited to 63 bytes; if sanitize_identifier(collection/relation) produces a label longer than that, the underlying table name will be truncated by Postgres/AGE, causing _label_exists() to return false and ensure_label() to repeatedly try create_vlabel/create_elabel (likely failing). Consider validating/truncating/hashing sanitized_label to <=63 bytes (and using the same value consistently for create*label and pg_tables lookups).
| @staticmethod | ||
| def _ask_graph_backend() -> str: | ||
| """Prompt the user for the episodic-memory graph backend.""" | ||
| choice = ( | ||
| input( | ||
| "Which graph backend would you like to use for episodic " | ||
| "memory? (neo4j/age) [neo4j]: " | ||
| ) | ||
| .strip() | ||
| .lower() | ||
| ) | ||
| if choice in ("age", "a"): | ||
| return "age" | ||
| return "neo4j" | ||
|
|
||
| def install(self, prompt: bool = True) -> None: | ||
| """Install and configure MemMachine.""" | ||
| graph_backend = self._ask_graph_backend() | ||
| neo4j_started_by_installer = False |
There was a problem hiding this comment.
Installer.install() always calls _ask_graph_backend() (input prompt) even when install(prompt=False). Since prompt is forwarded to ConfigurationWizard to control interactive behavior, install() should also respect it (e.g., default to "neo4j" when prompt is False) to avoid blocking non-interactive runs/tests.
| # embedding name for hydration. | ||
| kind = "node" if entity_type is EntityType.NODE else "edge" | ||
| key = f"{kind}|{sanitized_collection_or_relation}|{sanitized_embedding_name}" | ||
| digest = hashlib.sha1(key.encode("utf-8")).hexdigest()[:16] |
There was a problem hiding this comment.
_vector_table_name() uses hashlib.sha1() without the usedforsecurity=False flag. On some FIPS-enabled Python builds this can raise at runtime; the same file already uses usedforsecurity=False for index-name hashing, so side-table hashing should do the same for consistency and FIPS compatibility.
| digest = hashlib.sha1(key.encode("utf-8")).hexdigest()[:16] | |
| digest = hashlib.sha1( | |
| key.encode("utf-8"), usedforsecurity=False | |
| ).hexdigest()[:16] |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. If you are still working on this, please push a commit or leave a comment. Reviewers: please respond, or add the |
Purpose of the change
Adds Apache AGE (openCypher over PostgreSQL, Apache-2.0) as a third supported graph backend behind
VectorGraphStore, alongside Neo4j and NebulaGraph. Enables a single-Postgres deployment where episodic memory (graph) and semantic memory (pgvector) share one database service, and provides an Apache-2.0-compatible alternative to Neo4j that's operationally lighter than NebulaGraph's 3-role distributed topology.Opened in response to #1324.
Description
MemMachine already requires Postgres + pgvector for semantic memory. AGE ships the
ageandvectorextensions in the same Postgres, so with this PR a minimal AGE deployment collapses what was previously two database services (Neo4j + pgvector Postgres) into one. NebulaGraph already covers Apache-2.0 licensing for users at scale, but bundling it means running a 3-role distributed system; AGE fits single-instance and small-cluster deployments that want to drop Neo4j without adopting a new product.What's added:
AgeVectorGraphStore+age_utils: fullVectorGraphStoreimplementation against AGE + pgvector. Graph structure lives in AGE; per-embedding similarity goes to pgvector side tables keyed by vertexuid, tracked via a per-graph registry table. Supports the full pgvector 1..16000 dimension range. Asserts AGE >= 1.6.0 on first use so older versions fail fast instead of hitting the cryptic "SET clause expects a map" runtime error.AgeConf+DatabaseManagerplumbing symmetric withNeo4jConf./resourcesstatus endpoint surfaces AGE backends (and, while touching that code, the pre-existingnebula_graphgap).deployments/docker/postgres-age/Dockerfilelayeringpostgresql-16-pgvectoronapache/age:release_PG16_1.6.0. Published asmemmachine/postgres-age:pg16-1.6.0via a new build job in.github/workflows/docker-image.yml.docker-compose.age.yml: self-contained single-Postgres AGE stack (one command). Basedocker-compose.ymlkeepspostgres-ageunder an opt-inageprofile for the "run AGE alongside a pip-installed server" flow.0.1.0->0.2.0): new top-levelepisodicBackend: neo4j | agetoggle, orthogonal to per-backend*.enabledflags. In AGE mode the standalone pgvectorpostgresDeployment is automatically skipped and all relational workloads ride on the AGE-enabled Postgres. Validation guard fails cleanly on invalid values.memmachine-configureprompts for backend choice and routes to the existing Neo4j auto-install or the wizard's interactive AGE prompts. Wizard emits a pairedAgeConf+ co-locatedSqlAlchemyConfso semantic memory's pgvector store shares the AGE Postgres.semantic_memory.databasepointed atNEO4J_DB_ID(a graph backend, not a relational one).sample_configs/episodic_memory_config.age.sample.age_utils,AgeConflifecycle tests intest_database_manager, wizard tests for AGE and invalid-backend paths, and a testcontainer-backed integration test built from the shipped Dockerfile.Dependencies: none added -
asyncpg,pgvector,sqlalchemy, andalembicare already pinned inpackages/server/pyproject.toml.Fixes/Closes
Fixes #(issue number)
Type of change
How Has This Been Tested?
Unit tests (no Docker required):
Integration test (requires Docker; built from the shipped
deployments/docker/postgres-age/Dockerfileso the test environment matches prod):Helm template verified on all four deployment scenarios plus invalid-input guard:
Docker Compose verified:
Test Results:
postgres-age),db_postgresanddb_ageboth target it, onewait-for-postgresinit container,POSTGRES_PASSWORDsourced frompostgres-age-secret. Nopostgres-deployment, noneo4j-deployment.Error: ... episodicBackend must be one of: neo4j, age (got "foobar")- expected.ast.parse; all YAML files passdocker compose config --quiet.Checklist
Maintainer Checklist
Screenshots/Gifs
N/A - backend + deployment change
Further comments
0.1.0->0.2.0because templates and values changed materially.semantic_memory.databasewas previously set toNEO4J_DB_ID- a pre-existing bug sinceSemanticMemoryConf.databaseexpects a relational backend. It's now left unset in Neo4j mode soSemanticMemoryConf's existing_auto_disable_when_incompletevalidator fires cleanly; in AGE mode it targets the companionSqlAlchemyConf. Users who relied on the broken behavior will see semantic memory cleanly disabled instead of failing obscurely at runtime.episodicBackend=age,postgres.enabledis ignored because the postgres-age image already shipsvector. This is intentional and documented invalues.yaml/deployments/helm/README.md.memmachine/postgres-age:pg16-1.6.0doesn't exist in the registry yet - the new CI job publishes it on the first release cut with this PR merged. Until then, helm deployments will need to build and push locally (documented invalues.yaml).SET n += $paraminsidecypher()fails with "SET clause expects a map"; aliased viaWITH $param AS pbeforeSET. The version probe enforces AGE >= 1.6.0 with a clear error.postgres-ageimage build is pinned tolinux/amd64: apache/age has had inconsistent arm64 availability historically. Easy flip to multi-arch once confirmed upstream.Happy to split any subsystem out into a follow-up if reviewers prefer a tighter scope (e.g. the NebulaGraph
/resourcesstatus addition, or the wizard'ssemantic_memory.databasefix), but they're small enough I've kept them bundled.