Web app that restores Proxmox VE virtual machines from Pure Storage FlashArray
snapshots. Disks are cloned via lvm-xcopy
so the bytes never traverse the host: SCSI EXTENDED COPY on iSCSI/FC, and
NVMe Copy (cross-namespace, Format 2h / TP4130) on NVMe-TCP. Falls back to a
host-side qemu-img convert only when an NVMe controller does not support
cross-namespace Copy, the restore is cross-array, or the operator explicitly
requests it.
- Multi-cluster: connect one or more Proxmox clusters (API token or password auth) and one or more Pure FlashArrays (API token).
- Automatic 1:1 mapping of Pure volumes ↔ Proxmox LVM storages via the SCSI serial / NVMe WWN of the PV backing each VG.
- Per-node Pure host auto-match: if no host group is configured for a Proxmox connection, the backend reads each node's IQN / NQN / WWN and matches it against hosts defined on the array.
- Tree view of VMs → on-array snapshots (ad-hoc and protection-group).
- On-demand snapshot creation on the array.
- Two restore modes:
- Overwrite the source VM's disks in place (VM is stopped first).
- Create a new VM with the source's exact configuration, fresh LVs, and optionally preserved MAC addresses.
- Boot option: bring the restored VM up with all NICs
link_down=1so it cannot reach the network until the operator explicitly re-enables them. - Background job runner with live log tail per restore (polled via REST).
- Background inventory refresh: a detached asyncio task walks every
configured Proxmox cluster on a fixed cadence (default 600s, configurable
via
APP_INVENTORY_REFRESH_SECONDS, set to0to disable) so deleted VMs and new Pure snapshots stay reflected in the local DB without an operator clicking Refresh. - Deleted-VM surfacing: when a VM is destroyed in Proxmox the app keeps
its last-seen config in
remembered_vmskeyed by(vmid, vm_create_time), so multiple incarnations of the same VMID coexist. As long as a Pure snapshot of the deleted incarnation's disks survives, the inventory tree still lists it (with a deleted pill) andnew_vmrestores remain available — pinned to the exact remembered row by id. - Predates-disk safety: the inventory API records the first time it
observed each VM-disk tuple (anchored to the latest Proxmox
qmcreatetask so VMID reuse is handled correctly) and refuses restores — HTTP 409 with a descriptive message — when the chosen snapshot was taken before the current disk existed. The UI disables the Restore button and shows a red "predates disk" pill for those snapshots. For deleted VMs the check anchors on the remembered incarnation's own create/last-seen time instead of the shared sighting row, so a VMID-reusing successor can't poison the deleted VM's view. - Force host-copy override: an opt-in checkbox on the restore dialog
bypasses
lvm-xcopy/ NVMe Copy and forces a host-sideqemu-img converteven on intra-array restores. Useful as a manual escape hatch when a Pure firmware revision rejects cross-namespace NVMe Copy (status0x4002, Invalid Field in Command); the host path always works at the cost of read+write bandwidth across the SAN.
flowchart LR
subgraph Browser["User Browser"]
UI["React + Vite SPA<br/>(Tailwind, React Query)"]
end
subgraph Host["Docker host"]
subgraph FE["pps-frontend container"]
NGX["nginx :443 (TLS)<br/>:80 → 301 → :443"]
end
subgraph BE["pps-backend container"]
API["FastAPI (uvicorn) :8000<br/>routers: auth, connections,<br/>inventory, restore, security"]
DB[("SQLite /data/app.db<br/>users, connections,<br/>vm_disk_sightings,<br/>remembered_vms,<br/>restore_jobs")]
Jobs["Job runner<br/>(asyncio.create_task)<br/>+ log flusher"]
Refresh["Inventory refresh<br/>(periodic asyncio task,<br/>default every 600s)"]
end
end
subgraph Proxmox["Proxmox cluster"]
PAPI["Proxmox API :8006<br/>(proxmoxer)"]
NODE["Proxmox node(s)<br/>rescan-scsi-bus, multipath,<br/>vgimportclone, lvm-xcopy, dd"]
end
subgraph Pure["Pure FlashArray"]
PREST["FlashArray REST<br/>(py-pure-client)"]
PVOL[("Volumes +<br/>Snapshots")]
end
UI -- "HTTPS /api/*" --> NGX
NGX -- "proxy_pass /api/" --> API
API --- DB
API --- Jobs
Refresh --- DB
Refresh -- "GET /tree path:<br/>upsert remembered_vms +<br/>vm_disk_sightings" --> PAPI
Refresh -- "list snapshots" --> PREST
Jobs -- "proxmoxer: VM list,<br/>config, tasks, clone" --> PAPI
Jobs -- "py-pure-client:<br/>copy snap → temp vol,<br/>connect host/HG, list snaps" --> PREST
Jobs -- "asyncssh: pvscan,<br/>vgimportclone, lvm-xcopy / dd,<br/>cleanup" --> NODE
PREST --- PVOL
PAPI -.- NODE
PVOL -- "iSCSI / FC: XCOPY offload<br/>NVMe-TCP: NVMe Copy F2h<br/>(host-side qemu-img fallback)" --> NODE
| Piece | Where | Purpose |
|---|---|---|
frontend/ |
React 18 + Vite + Tailwind, served by nginx in pps-frontend |
SPA; nginx also reverse-proxies /api/* to the backend container |
backend/app/api/ |
FastAPI routers: auth, connections, inventory, restore, security |
Stateless HTTP surface |
backend/app/services/ |
proxmox (proxmoxer), pure (py-pure-client), ssh (asyncssh), mapping, restore, inventory_refresh, context, crypto, security, tls |
Integrations, restore orchestration, periodic inventory refresh |
backend/app/models/ |
SQLAlchemy 2.x async | User, ProxmoxConnection, PureConnection, ProxmoxPureLink, SshCredential, VmDiskSighting, RememberedVm, RestoreJob |
| SQLite | ./data/app.db (bind-mounted into the backend container) |
Persists users, encrypted connection secrets, disk sightings, remembered VM configs, job history + logs |
sequenceDiagram
autonumber
participant U as Operator UI
participant API as FastAPI /api/restore
participant DB as SQLite
participant J as Job runner asyncio
participant PX as Proxmox API
participant SSH as Proxmox node ssh
participant PU as Pure FlashArray
U->>API: POST /api/restore kind vmid snapshot
API->>DB: load sightings + storage mappings
API->>PU: list snapshots and validate created_at
API-->>U: 409 if snapshot predates any disk
API->>DB: insert RestoreJob pending
API-->>U: 201 returning job_id
API->>J: asyncio.create_task _run_job
J->>PU: copy snapshot to temp volume pxrestore-XXXX
J->>PU: connect temp vol to host group or matched per-node host
J->>SSH: rescan-scsi-bus and multipath -r
J->>SSH: find PV by WWN under /dev/mapper or /dev/nvme
J->>SSH: vgimportclone to pxrestore_XXXX VG
J->>SSH: vgchange -ay then list LVs as JSON
alt options.force_host_copy OR cross-array
J->>SSH: lvcreate then qemu-img convert with O_DIRECT
else native offload
J->>SSH: lvm-xcopy src to dst array offload
Note over SSH: SCSI/FC EXTENDED COPY<br/>NVMe-TCP NVMe Copy F2h TP4130
opt NVMe controller lacks cross-namespace Copy
J->>SSH: fallback lvcreate then qemu-img convert
end
end
opt new_vm flow
J->>PX: allocate new VMID and disks replicate config
end
opt overwrite flow
J->>PX: stop source VM
end
J->>SSH: vgchange -an pxrestore_XXXX then vgremove -f
J->>PU: disconnect and delete temp volume
J->>SSH: rescan and drop stale LUN
J->>DB: RestoreJob.status success or failed plus error
- Stage. Pure API
post_volumes(..., source=<snapshot>)creates a metadata-only clonepxrestore-<tag>. The temp volume is then connected to either the configured host group or the target node's matched Pure host. - Attach. SSH into the target node:
rescan-scsi-bus.sh -r,iscsiadm -m session --rescan(iSCSI),multipath -r,pvscan --cache --activate ay. Locate the new PV by WWN (3624a9370<serial>for SCSI,nvme*by-id for NVMe-TCP). - Import.
vgimportclone --basevgname pxrestore_<tag> <device>avoids a UUID collision with the source VG, thenvgchange -ay. - Copy. Per VM disk LV:
lvm-xcopy /dev/pxrestore_<tag>/<lv> /dev/<target_vg>/<lv>. The driver picks the right offload primitive automatically:- SCSI/FC PV → SCSI EXTENDED COPY (LID1) via
SG_IO. - NVMe-TCP PV → NVMe Copy with Source Descriptor Format 2h (TP4130 cross-namespace); CDFE is enabled on the controller automatically.
- If the NVMe controller / firmware rejects Format 2h (status 0x4002),
or the destination LV lives on a different Pure array (cross-array
restore), or the operator ticked Force host copy in the restore
dialog, the runner uses host-side
lvcreate -L <bytes>Bplusqemu-img convert -n -t none -T none -W -m 8 -f raw -O raw. Parallel coroutines + O_DIRECT measured ~3x the throughput ofdd bs=8Mon Pure NVMe-TCP because qemu-img keeps multiple I/Os in flight while dd serializes one block at a time.
- SCSI/FC PV → SCSI EXTENDED COPY (LID1) via
- Finalize. Overwrite mode stops the source VM before the copy; new-VM
mode allocates fresh LVs and replicates the source config (optionally
preserving MACs, and optionally starting with
link_down=1). - Cleanup.
vgchange -an,vgremove -fthe temp VG; Puredelete_connections+delete_volumesthe temp volume; one more rescan to evict the stale LUN.
VMID reuse on Proxmox (destroy VM 120, later recreate VM 120) can make an old snapshot "look" compatible by VMID alone, but that snapshot was taken against a totally different volume. The inventory layer is built around a small identity model that catches this and also keeps deleted VMs restorable:
- Every call to
GET /api/inventory/{proxmox_id}/treewalks Proxmox task history for each node, picks the latest successfulqmcreateper VMID, and uses that timestamp as the canonicalvm_create_time. vm_disk_sightingsupserts one row per(proxmox_connection_id, vmid, storage, volume)withfirst_seen_at = <qmcreate start_time>(or "now" for cold-start when task history is unavailable, in which case predates checks are suppressed for that disk on the current response). Later refreshes realignfirst_seen_atif a newer creation event appears.remembered_vmssnapshots the live VM config and is keyed by(proxmox_connection_id, vmid, vm_create_time), so multiple incarnations of the same VMID coexist as separate rows. Each restore from a deleted VM is pinned to a specific row id (remembered_vm_idon the restore request) so the right incarnation's config is replayed.- A snapshot is flagged
predates_disk=truewhen its Purecreatedtime is earlier thanfirst_seen_atfor any current disk on a live VM, or earlier thanvm_create_time(orlast_seen_atas a fallback) for a deleted VM. Anchoring deleted VMs on their own remembered timestamps avoids leaking a VMID-reusing successor's disk birthdate into the deleted view. POST /api/restorere-runs the same check server-side and returns HTTP 409 with a message likeSnapshot '…test' was taken at 2026-04-23T13:01:35Z but disk 'vm-120-disk-0' on VM 120 was first observed at 2026-04-23T14:16:45Z; the snapshot predates this disk and cannot contain its LV. Refusing restore.- The UI disables the Restore button and renders a red "predates disk" pill with a tooltip explaining why.
Two concerns drive the periodic refresh:
- The
remembered_vms/vm_disk_sightingsupsert only happens insideGET /api/inventory/{proxmox_id}/tree. If a VM is destroyed in Proxmox while no operator is in the UI, the next inventory load only sees the already-gone VM and would have no remembered config to restore from. - New Pure snapshots taken between operator visits should still surface in the tree without a manual refresh.
The backend therefore spawns a detached asyncio task on startup
(app.services.inventory_refresh.start_periodic_refresh) that walks every
ProxmoxConnection and calls get_tree on a fixed cadence. Each connection
runs in its own session so a transient failure on one cluster does not block
the others. Cadence is controlled by APP_INVENTORY_REFRESH_SECONDS
(default 600); set it to 0 to disable the loop entirely.
On each Proxmox node the app will SSH into:
-
sg3-utils(forrescan-scsi-bus.sh) -
multipath-toolsif you use multipath -
lvm2(present by default) -
nvme-cliif any target storage is NVMe-TCP (used for initiator identification and rescans) -
git,build-essential— only needed once, to buildlvm-xcopy; the backend installslvm-xcopyinto/usr/local/lib/lvm-xcopywith a/usr/local/bin/lvm-xcopylauncher on first use. You can pre-install manually:git clone --depth 1 https://github.com/PureStorage-OpenConnect/lvm-xcopy cd lvm-xcopy && make && sudo install -m 0755 lvm-xcopy /usr/local/bin/lvm-xcopy
On the Pure array, one of:
- Host group (recommended for clusters): create a host group containing
every Proxmox node as a Pure host, and set its name as
pure_host_groupon the Proxmox connection. The temp volume is connected to the host group during a restore. - Per-node hosts: leave
pure_host_groupblank. The backend reads each node's IQN (/etc/iscsi/initiatorname.iscsi), NQN (/etc/nvme/hostnqn), and FC WWNs (/sys/class/fc_host/.../port_name), and matches them to hosts already defined on the array. The temp volume is connected only to the matched host for the target node.
Plus, on the array:
- An API token with privileges to create/destroy volumes and connections and to read snapshots.
On Proxmox:
- An API token (recommended) or a user/password with permissions on
VM.Config.*,VM.PowerMgmt,Datastore.Allocate,Datastore.AllocateSpace,Datastore.Audit,Sys.Audit,VM.Audit. - SSH access to every node with
sudo-less root or a user in thesudogroup that can runlvm,vgimportclone,rescan-scsi-bus.sh,multipath,iscsiadm,nvme,dd, andlvm-xcopywithout a password prompt.
The app runs entirely in containers and reaches Proxmox over the API (:8006) and SSH (:22), the application must be able to reach both your Proxmox cluster(s) and your Pure FlashArray management interfaces. Deploy the application on a separate Linux machine that can run Docker and has network access to both your Proxmox cluster(s) and your Pure FlashArray management interfaces.
Two supported flows. The first is the recommended one for production; the second is for local development or air-gapped sites.
Images are published to GitHub Container Registry by the Release
workflow on every push to main and on every vX.Y.Z git tag. The
container host needs only Docker — no source tree, no build toolchain.
One-shot install on a fresh Linux container host (run as root):
curl -fsSL https://raw.githubusercontent.com/PureStorage-OpenConnect/proxmox-pure-snap-restore/main/deploy/install.sh \
| PPS_OWNER=purestorage-openconnect PPS_IMAGE_TAG=v0.2.1 bashThe script:
- Installs Docker via the upstream
get.docker.comscript (ordeploy/install_docker.shif you cloned the repo first). - Lays out
/opt/proxmox-pure-snap-restore/{data,docker-compose.yml,.env}. - Generates
APP_SECRET_KEY+APP_ENCRYPTION_KEYand writes the image coordinates (PPS_BACKEND_IMAGE,PPS_FRONTEND_IMAGE,PPS_IMAGE_TAG) into.env. docker compose pull && up -d.
After install:
cd /opt/proxmox-pure-snap-restore
deploy/upgrade.sh v0.2.2 # pin a new tag and pull
deploy/upgrade.sh v0.2.1 # roll back the same way
docker compose logs -f # tailPinning a specific git tag in .env (PPS_IMAGE_TAG=v0.2.1) is the
intended production posture; latest is fine for staging.
cp .env.example .env
# Generate keys:
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# Paste into APP_ENCRYPTION_KEY. Also set APP_SECRET_KEY and APP_ADMIN_PASSWORD.
docker compose up -d --build
# UI: https://<host>:8443
# API: http://<host>:8000/api (direct, internal use only)For an existing remote host that already has the repo, make deploy-remote REMOTE_HOST=root@host rsyncs a tarball, runs
deploy/remote_deploy.sh, and rebuilds locally on the host. Useful when
the host can't reach ghcr.io.
The first boot generates a self-signed TLS cert into ./data/certs/
(cert.pem + key.pem). Replace it from the Settings → Security tab
in the UI: upload your own PEM cert and key, or click Generate &
install to mint a fresh self-signed cert. The frontend container
receives a SIGHUP and reloads nginx without dropping HTTP traffic.
The SQLite DB lives under ./data/app.db (bind-mounted into the backend
container). The initial admin user is seeded from APP_ADMIN_USERNAME /
APP_ADMIN_PASSWORD on first boot only. Leaving APP_ADMIN_PASSWORD blank
seeds the admin with no password; the first sign-in is forced through a
"set new password" page before any other part of the UI is reachable.
| Var | Default | Purpose |
|---|---|---|
APP_SECRET_KEY |
dev-insecure-change-me |
JWT signing key (HS256) |
APP_ENCRYPTION_KEY |
— | Fernet key that encrypts secrets at rest |
APP_DB_URL |
sqlite+aiosqlite:////data/app.db |
Async SQLAlchemy URL |
APP_CORS_ORIGINS |
http://localhost:5173 |
Comma-separated allowlist |
APP_ADMIN_USERNAME |
admin |
Seeded on first boot |
APP_ADMIN_PASSWORD |
(blank) | Seeded on first boot; blank forces change-on-first-login |
APP_JWT_EXPIRES_MINUTES |
480 |
Session length |
APP_LOG_LEVEL |
INFO |
Uvicorn/app log level |
APP_INVENTORY_REFRESH_SECONDS |
600 |
Cadence of the background inventory refresh task; 0 disables it |
APP_TLS_CERT_DIR |
/data/certs |
Where the backend writes/reads cert.pem + key.pem for the frontend |
APP_FRONTEND_CONTAINER |
pps-frontend |
Container name SIGHUP'd after a TLS cert update |
APP_DOCKER_SOCKET |
/var/run/docker.sock |
Path to docker socket used to signal the frontend |
The image-coordinate variables (read by docker-compose.prod.yml) live
in the same .env file:
| Var | Default | Purpose |
|---|---|---|
PPS_BACKEND_IMAGE |
ghcr.io/purestorage-openconnect/proxmox-pure-snap-restore-backend |
Backend image reference |
PPS_FRONTEND_IMAGE |
ghcr.io/purestorage-openconnect/proxmox-pure-snap-restore-frontend |
Frontend image reference |
PPS_IMAGE_TAG |
latest |
Tag pulled by docker compose pull. Pin a vX.Y.Z tag for production |
PPS_HTTP_PORT |
8080 |
Host port that nginx serves HTTP on (redirects to HTTPS) |
PPS_HTTPS_PORT |
8443 |
Host port that nginx serves HTTPS on |
Two GitHub Actions workflows live under .github/workflows/:
ci.ymlruns on every PR and push tomain:- Builds
backend/Dockerfile --target testand runspytest -vinside that image, so the test environment is bit-for-bit identical to the runtime image (same Python, same wheels) plus the[dev]extras. - Builds the frontend with
npm install && npm run buildto catch TypeScript and Vite breakage before merge.
- Builds
release.ymlruns on push tomainand onv*.*.*git tags. It buildsbackend(targetruntime) andfrontendimages and pushes them toghcr.io/purestorage-openconnect/proxmox-pure-snap-restore-{backend,frontend}with the canonical tag set produced bydocker/metadata-action::mainand:sha-<short>on every push tomain:vX.Y.Z,:X.Y,:X, and:lateston eachv*.*.*git tag
Cutting a release is therefore:
git tag v0.2.1
git push origin v0.2.1
# Wait for the Release workflow to finish, then on the host:
ssh root@<host> "cd /opt/proxmox-pure-snap-restore && deploy/upgrade.sh v0.2.1"The image visibility on GHCR defaults to private. Make the two packages
public from the GitHub Packages tab if you want the install script to
work without docker login.
A Makefile wraps the common operations:
make test # backend pytest in the test image
make lint # ruff + mypy in the test image
make frontend-build # vite build in a node:20 container
make build TAG=dev # build both runtime images locally
make push TAG=dev REGISTRY=ghcr.io/alice
make up / make down # docker compose up -d / down (dev compose)
make deploy-remote REMOTE_HOST=root@docker-host.example.com
make upgrade-remote REMOTE_HOST=root@docker-host.example.comAll routes are under /api. Auth is JWT bearer on everything except
/api/auth/login and /api/health. Admin-only routes enforce
require_admin.
| Method | Path | Purpose |
|---|---|---|
POST |
/api/auth/login |
Exchange credentials for a JWT |
GET |
/api/auth/me |
Current user (username, role, must_change_password) |
POST |
/api/auth/change-password |
Change current user's password |
GET |
/api/connections/proxmox |
List Proxmox connections (admin) |
POST |
/api/connections/proxmox |
Create Proxmox connection (admin) |
PATCH/DELETE |
/api/connections/proxmox/{id} |
Update/remove (admin) |
POST |
/api/connections/proxmox/{id}/test |
Live ping the Proxmox API |
GET / POST / PATCH / DELETE |
/api/connections/pure[...] |
Same for Pure arrays |
GET / POST / PATCH / DELETE |
/api/connections/ssh[...] |
SSH credentials |
GET |
/api/inventory/{proxmox_id}/tree |
Full VM → disks → snapshots tree, with mapping diagnostics, deleted-VM rows from remembered_vms, and predates_disk flags |
POST |
/api/inventory/snapshots |
Create an ad-hoc Pure snapshot on a volume |
POST |
/api/restore |
Start a restore (admin); accepts force_host_copy to bypass array offload, and remembered_vm_id to pin a deleted-VM restore to a specific incarnation; returns 409 if the snapshot predates any disk |
GET |
/api/restore |
List recent jobs |
GET |
/api/restore/{id} |
Job detail incl. streaming log buffer |
GET |
/api/security/tls |
Current TLS cert status (admin) |
POST |
/api/security/tls/upload |
Upload custom cert + key PEM (admin) |
POST |
/api/security/tls/regenerate |
Mint a fresh self-signed cert (admin) |
POST |
/api/security/tls/reload |
SIGHUP the frontend nginx to pick up the new cert (admin) |
GET |
/api/health |
Liveness probe |
# Backend
cd backend
python -m venv .venv && . .venv/Scripts/activate # or: source .venv/bin/activate
pip install -e .[dev]
uvicorn app.main:app --reload
# Frontend
cd frontend
npm install
BACKEND_URL=http://localhost:8000 npm run dev- All credentials (Proxmox secret, Pure API token, SSH private key and
passphrase, SSH password) are encrypted at rest using Fernet with
APP_ENCRYPTION_KEY. If you lose that key, the secrets are unrecoverable. - Passwords are hashed with Argon2id.
- The restore orchestrator performs destructive operations. Overwrite mode stops the source VM before the copy; it does not take a confirmation snapshot automatically — consider triggering an ad-hoc Pure snapshot first (the Inventory page has a "Snapshot now" action for this).
vm_disk_sightings+ the 409 predates check reduce the blast radius of operator error with VMID reuse, but they are not a substitute for array- side snapshot retention policies.- Multi-user with RBAC is scaffolded (
rolecolumn onusers,require_admindep) but only a single admin is seeded today.
- Assumes 1:1 Pure volume ↔ LVM VG mapping. VGs spanning multiple PVs are flagged unmapped and not restorable.
qcow2/ raw-file storages are not supported; the copy path requires LVM thick volumes so thatlvm-xcopy(or theddfallback) has an LV on both sides.- For NVMe-TCP storage, cross-namespace offload requires a controller that
supports NVMe Copy Format 2h (TP4130) and advertises it in
OCFS. On controllers without TP4130 support the runner falls back to host-sidedd, in which case restore time scales with network + device bandwidth rather than being metadata-only. - Protection-group snapshot naming shows up naturally in the tree (snapshot name prefixed with the PG), but the app does not yet let you create PGs.
- No cluster-wide HA awareness: restore runs against a specific node.
Licensed under the Apache License, Version 2.0. Redistributions
must retain the copyright, license, and any NOTICE file per §4 of the
license.