Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
4cc1676
emulator pull progress
BilalG1 Apr 15, 2026
a65022b
emulator fast-start via VM snapshot + live secret rotation
BilalG1 Apr 15, 2026
30dbdff
faster snapshot resume via mapped-ram + rotation opt-out
BilalG1 Apr 15, 2026
6021a04
build QEMU 10.2.2 from source in CI for mapped-ram support
BilalG1 Apr 15, 2026
0c0d726
build stack-cli's workspace deps in emulator CI
BilalG1 Apr 15, 2026
b03486e
fix emulator pull --pr/--run snapshot detection
BilalG1 Apr 15, 2026
0b3a9cf
fix sentinel marker path in docker/server entrypoint
BilalG1 Apr 15, 2026
cfdc882
Merge remote-tracking branch 'origin/dev' into local-emulator-qol-fixes
BilalG1 Apr 15, 2026
2c8ad4c
address unresolved PR review comments on snapshot resume path
BilalG1 Apr 15, 2026
76f9543
simplify emulator fast-start: tighter polls, drop dead wrappers
BilalG1 Apr 15, 2026
3586115
fix snapshot resume host fs + restore standalone run-emulator.sh path
BilalG1 Apr 15, 2026
037755b
retry tsdown migration build to survive qemu-user futex hangs
BilalG1 Apr 15, 2026
894c1ce
fix CLI artifact download + build arm64 emulator on macOS runner
BilalG1 Apr 16, 2026
54ecda8
fix colima on GHA macOS: use QEMU backend instead of VZ driver
BilalG1 Apr 16, 2026
49a20ed
split arm64 build: Docker on Linux, QEMU snapshot on macOS
BilalG1 Apr 16, 2026
11531eb
fix check_deps: skip docker requirement when SKIP_DOCKER_BUILD=1
BilalG1 Apr 16, 2026
7534637
fix lint warning + remove invalid `local` in top-level loop
BilalG1 Apr 16, 2026
288b80e
fix empty array expansion under bash 3.2 (macOS)
BilalG1 Apr 16, 2026
d94aa66
capture emulator snapshot locally during pull instead of shipping fro…
BilalG1 Apr 16, 2026
7db9fe4
fix CI verify step: use freshly-built qcow2 via STACK_EMULATOR_HOME
BilalG1 Apr 16, 2026
510ef38
fix PCI slot mismatch in snapshot capture + stale runtime ISO on dire…
BilalG1 Apr 16, 2026
39b5c08
fix smoke test: skip shell ISO regen when CLI already wrote it
BilalG1 Apr 16, 2026
7acb3ed
fix capture path: guard against set -u + preserve cmd_capture's empty…
BilalG1 Apr 16, 2026
38974ca
Merge branch 'dev' into local-emulator-qol-fixes
BilalG1 Apr 20, 2026
8f9b9c1
emulator build: split snapshot-bake from savevm capture
BilalG1 Apr 20, 2026
fbd3207
seed: bump session activity events tx timeout to 30s
BilalG1 Apr 20, 2026
c8630c6
emulator: bump Postgres statement_timeout 30s → 120s
BilalG1 Apr 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
address unresolved PR review comments on snapshot resume path
- stop_vm no longer deletes runtime-config.iso; the CLI owns its
  lifecycle and the snapshot → cold-boot fallback needs it preserved
  (cmd_reset still wipes RUN_DIR for a full reset). Also sweeps qga.sock.
- Write internal-pck to \$VM_DIR on the host in snapshot mode. Cold boot
  publishes this via virtfs/9p; snapshot mode drops virtfs, so
  --config-file flows would otherwise hang. Handles both the rotation
  path (fresh PCK) and EMULATOR_NO_ROTATION (placeholder PCK).
- Pin RAM in snapshot mode to the build-time 4096 (overridable via
  EMULATOR_SNAPSHOT_RAM). Migration replay requires an identical -m
  value, same constraint as CPU count.
- Fail amd64 build when .savevm.zst is missing rather than shipping a
  cold-boot-only release silently. arm64 stays best-effort for now
  because it runs under TCG and can't be verified end-to-end.
- Install Node/pnpm on both arches. arm64 also runs
  generate-env-development.mjs, which otherwise relied on the runner
  image's preinstalled Node.
  • Loading branch information
BilalG1 committed Apr 15, 2026
commit 2c8ad4c77a9588dad508351b4b1e7998a0f2aa9c
13 changes: 10 additions & 3 deletions .github/workflows/qemu-emulator-build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,14 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

# Node/pnpm are needed on both arches: arm64 also runs
# generate-env-development.mjs inside build-image.sh. amd64 additionally
# builds and runs the CLI for the verification steps below.
- uses: pnpm/action-setup@v4
if: matrix.arch == 'amd64'
with:
version: 10.23.0

- uses: actions/setup-node@v4
if: matrix.arch == 'amd64'
with:
node-version: 22
cache: pnpm
Comment thread
BilalG1 marked this conversation as resolved.
Expand Down Expand Up @@ -177,8 +178,14 @@ jobs:
if [ -f "$SAVEVM" ]; then
cp "$SAVEVM" "stack-emulator-${{ matrix.arch }}.savevm.zst"
ls -lh "stack-emulator-${{ matrix.arch }}.savevm.zst"
elif [ "${{ matrix.arch }}" = "amd64" ]; then
# amd64 is the fast-resume contract: if the build didn't produce a
# snapshot, fail loudly rather than silently shipping a
# cold-boot-only release.
echo "ERROR: snapshot build expected to produce $SAVEVM for amd64." >&2
exit 1
else
echo "NOTE: no savevm snapshot was produced; fast-start will be unavailable for this arch."
echo "NOTE: no savevm snapshot was produced for ${{ matrix.arch }}; fast-start will be unavailable for this arch."
fi

- name: Upload image artifact
Expand Down
64 changes: 47 additions & 17 deletions docker/local-emulator/qemu/run-emulator.sh
Original file line number Diff line number Diff line change
Expand Up @@ -308,17 +308,25 @@ build_qemu_cmd() {
# build and are not needed at runtime, but their virtio-blk slots must
# exist so the migration replay matches device IDs. Runtime-only devices
# (virtfs, balloon) live at higher slots — extra at destination is fine.
local snapshot_args=() runtime_only_args=() snapshot_smp="$VM_CPUS"
local snapshot_args=() runtime_only_args=() snapshot_smp="$VM_CPUS" snapshot_ram="$VM_RAM"
if snapshot_available; then
log "Snapshot found at $savevm_file — fast-resume enabled."
# -incoming defer: QEMU starts, waits for a QMP migrate-incoming command.
# We use that to set mapped-ram + multifd capabilities before loading,
# which enables parallel RAM restore (~2-3x faster than streamed decode).
snapshot_args+=(-incoming defer)
snapshot_smp="${EMULATOR_SNAPSHOT_CPUS:-4}"
# RAM size is baked into the snapshot; migration replay requires an
# identical -m value. Pin to the build-time RAM (4096) and ignore
# EMULATOR_RAM — override via EMULATOR_SNAPSHOT_RAM if a different
# snapshot was produced.
snapshot_ram="${EMULATOR_SNAPSHOT_RAM:-4096}"
if [ "$snapshot_smp" != "$VM_CPUS" ]; then
log "Pinning SMP to ${snapshot_smp} for snapshot resume (build-time value)."
fi
if [ "$snapshot_ram" != "$VM_RAM" ]; then
log "Pinning RAM to ${snapshot_ram}MB for snapshot resume (ignoring EMULATOR_RAM=${VM_RAM})."
fi

# Tiny placeholder ISOs to match the seed.iso / bundle.iso slots present
# at snapshot time. Their content doesn't matter (cloud-init has already
Expand Down Expand Up @@ -351,7 +359,7 @@ build_qemu_cmd() {
-cpu "$cpu"
"${firmware_args[@]}"
-boot order=c
-m "$VM_RAM"
-m "$snapshot_ram"
-smp "$snapshot_smp"
-drive "file=$VM_DIR/disk.qcow2,format=qcow2,if=virtio"
"${runtime_only_args[@]}"
Expand Down Expand Up @@ -502,14 +510,17 @@ qmp_incoming_and_cont() {
return 1
}

# Generate fresh per-install secrets on the host. We pass them to the guest
# through QGA's guest-exec input-data field (base64-encoded), so no host file
# or virtfs mount is needed in the snapshot path.
generate_fresh_secrets_payload() {
printf 'STACK_SEED_INTERNAL_PROJECT_PUBLISHABLE_CLIENT_KEY=%s\n' "$(openssl rand -hex 32)"
printf 'STACK_SEED_INTERNAL_PROJECT_SECRET_SERVER_KEY=%s\n' "$(openssl rand -hex 32)"
printf 'STACK_SEED_INTERNAL_PROJECT_SUPER_SECRET_ADMIN_KEY=%s\n' "$(openssl rand -hex 32)"
printf 'CRON_SECRET=%s\n' "$(openssl rand -hex 32)"
# Placeholder PCK baked into the snapshot. Kept in sync with the value in
# docker/local-emulator/qemu/cloud-init/emulator/user-data.
SNAPSHOT_PLACEHOLDER_PCK="00000000000000000000000000000000ffffffffffffffffffffffffffffffff"

# Write the internal PCK to the host path the CLI reads (see
# readInternalPck() in packages/stack-cli/src/commands/emulator.ts). In
# cold-boot mode the guest publishes this via virtfs/9p, but snapshot mode
# drops virtfs, so the host has to write it itself.
write_internal_pck_for_cli() {
local pck="$1"
(umask 077 && printf '%s' "$pck" > "$VM_DIR/internal-pck")
}

# Drive qemu-guest-agent via its virtserialport socket. QGA speaks the same
Expand Down Expand Up @@ -547,8 +558,22 @@ qga_trigger_fast_rotate() {
# message is available in serial.log. We pipe the fresh-secrets env file
# (as base64) to the script via input-data — keeps secrets off the
# filesystem and avoids needing virtfs.
local secrets_b64 resp pid
secrets_b64=$(generate_fresh_secrets_payload | base64 | tr -d '\n')
local fresh_pck fresh_ssk fresh_sak fresh_cron payload secrets_b64 resp pid
fresh_pck="$(openssl rand -hex 32)"
fresh_ssk="$(openssl rand -hex 32)"
fresh_sak="$(openssl rand -hex 32)"
fresh_cron="$(openssl rand -hex 32)"
payload=$(
printf 'STACK_SEED_INTERNAL_PROJECT_PUBLISHABLE_CLIENT_KEY=%s\n' "$fresh_pck"
printf 'STACK_SEED_INTERNAL_PROJECT_SECRET_SERVER_KEY=%s\n' "$fresh_ssk"
printf 'STACK_SEED_INTERNAL_PROJECT_SUPER_SECRET_ADMIN_KEY=%s\n' "$fresh_sak"
printf 'CRON_SECRET=%s\n' "$fresh_cron"
)
# Publish the fresh PCK to the host path the CLI reads. Writing before the
# guest-exec so a --config-file flow that polls from another process can
# pick it up the moment rotation completes.
write_internal_pck_for_cli "$fresh_pck"
secrets_b64=$(printf '%s' "$payload" | base64 | tr -d '\n')
local cmd
cmd=$(printf '{"execute":"guest-exec","arguments":{"path":"/usr/local/bin/trigger-fast-rotate","capture-output":true,"input-data":"%s"}}' "$secrets_b64")
resp=$(printf '%s\n' "$cmd" | qga_send || true)
Expand Down Expand Up @@ -599,8 +624,11 @@ stop_vm() {
kill -9 "$pid" 2>/dev/null || true
fi
fi
rm -f "$VM_DIR/qemu.pid" "$VM_DIR/monitor.sock" "$VM_DIR/serial.log"
rm -f "$VM_DIR/runtime-config.iso"
rm -f "$VM_DIR/qemu.pid" "$VM_DIR/monitor.sock" "$VM_DIR/qga.sock" "$VM_DIR/serial.log"
# Do NOT remove runtime-config.iso: the CLI owns its lifecycle and run-emulator.sh
# cannot regenerate it. Removing here breaks the snapshot → cold-boot fallback
# (which calls stop_vm before recursing into cmd_start → ensure_runtime_config_iso).
# `cmd_reset` wipes $RUN_DIR entirely when a full reset is wanted.
}

cmd_start() {
Expand Down Expand Up @@ -642,6 +670,9 @@ cmd_start() {

if [ "$EMULATOR_NO_ROTATION" = "1" ]; then
warn "EMULATOR_NO_ROTATION=1: snapshot's placeholder secrets are in effect — do not expose this instance."
# The placeholder PCK is live in the running image; publish it to the
# host path so --config-file flows still work.
write_internal_pck_for_cli "$SNAPSHOT_PLACEHOLDER_PCK"
if ! wait_for_condition "services" "$SNAPSHOT_READY_TIMEOUT" all_ready; then
warn "Services did not respond after resume — falling back to cold boot."
tail_vm_logs
Expand Down Expand Up @@ -691,9 +722,8 @@ cmd_start() {
snapshot_fallback_to_cold_boot() {
warn "Retrying with cold boot (EMULATOR_NO_SNAPSHOT=1)..."
stop_vm
# Wipe the overlay + fingerprint so build_qemu_cmd re-creates a fresh one,
# but keep the CLI-generated runtime-config.iso (we can't regenerate it
# from shell — the CLI owns that).
# Wipe the overlay + fingerprint so build_qemu_cmd re-creates a fresh one.
# runtime-config.iso is preserved by stop_vm (the CLI owns it).
rm -f "$VM_DIR/disk.qcow2" "$VM_DIR/base-image.fingerprint" \
"$VM_DIR/seed.phantom" "$VM_DIR/bundle.phantom"
EMULATOR_NO_SNAPSHOT=1
Expand Down
Loading