Skip to content

Drain per-host reservation when a VM starts on a different host#13363

Open
Kukunin wants to merge 1 commit into
apache:mainfrom
Kukunin:fix-stale-host-reservation-on-cross-host-start
Open

Drain per-host reservation when a VM starts on a different host#13363
Kukunin wants to merge 1 commit into
apache:mainfrom
Kukunin:fix-stale-host-reservation-on-cross-host-start

Conversation

@Kukunin
Copy link
Copy Markdown
Contributor

@Kukunin Kukunin commented Jun 6, 2026

What's happening

When a VM is stopped via the API, CloudStack moves its CPU/RAM from used to reserved on its last_host_id (see the Stopping → Stopped + OperationSucceeded branch in CapacityManagerImpl.postStateTransitionEvent). The idea is that a quick restart on the same host can reclaim its earmarked slot cheaply.

The asymmetry: when the VM later starts on the same host, the fromLastHost=true branch in allocateVmCapacity drains that reservation. When it starts on a different host, the old reservation just sits there. It only clears when:

  • capacity.skipcounting.hours (default 1h) elapses and updateCapacityForHost recycles it, or
  • the VM is destroyed/expunged.

Until then, the orphan reservation gets summed into the cluster's used + reserved aggregate by FirstFitPlanner.removeClustersCrossingThreshold. On a cluster that's already near cluster.memory.allocated.capacity.disablethreshold (default 0.85), the phantom can trip the threshold and block subsequent VM starts in the whole cluster — even though the VM in question isn't actually consuming anything on its old host.

We hit this on a fairly full cluster where stop-then-start cycles started failing intermittently with InsufficientServerCapacityException: No destination found. The "ghost" capacity was the released-but-not-drained reservation from the last stop.

The fix

postStateTransitionEvent now drains the VM's reservation on its previous host before allocating on the target host — regardless of whether the target is the same host or a different one. Treating both cases identically removes the fromLastHost asymmetry.

if ((newState == State.Starting || newState == State.Migrating || event == Event.AgentReportMigrated) && vm.getHostId() != null) {
    if (vm.getLastHostId() != null) {
        releaseVmCapacity(vm, true, false, vm.getLastHostId());
    }
    allocateVmCapacity(vm);
}

Side effects of unifying the path:

  • The fromLastHost=true branch in allocateVmCapacity is now unreachable from postStateTransitionEvent. The only other caller (VirtualMachineManagerImpl#reconfiguringOnExistingHost) already passes false, so the parameter is removed entirely.
  • Fixes the long-standing moveToReservered typo (3 e's) — renamed to moveToReserved throughout the interface, the impl, and the debug logs.
  • One logger.debug(String.format(...)) in postStateTransitionEvent switched to SLF4J {} placeholders to match the rest of the file.

Tests

Two new tests in CapacityManagerImplTest cover both transitions:

  • testPostStateTransitionReleasesStaleReservationWhenStartingOnDifferentHost — fails on main, passes with the fix. Verifies releaseVmCapacity(vm, true, false, oldHostId) is invoked.
  • testPostStateTransitionReleasesReservationWhenStartingOnSameHost — guards the unified contract so both paths stay in lockstep.

Manual validation

Reproduced on a small test cluster:

  1. Deploy a VM on host A, stop it (reserved=128M appears on A in op_host_capacity).
  2. Force a start on host B via hostid.
  3. Logs show the inline drain: release mem from host: A, old reserved: 128MB → new reserved: 0.
  4. After the VM reaches Running on B, last_host_id updates to B and updateCapacityForHost agrees the reservation is gone.

Same-host stop/start round-trip continues to behave the same way as before.

When a VM is stopped via the API, postStateTransitionEvent moves its
used capacity into reserved_capacity on its last host (so a quick
restart can reclaim it cheaply). Until now, this reservation was only
drained when the VM started again on the same lastHostId. Starting on
any other host left an orphan reservation that lingered for up to one
hour (capacity.skipcounting.hours) and was summed into the cluster
used+reserved aggregate by FirstFitPlanner.removeClustersCrossingThreshold,
spuriously tripping the disable-threshold and blocking later starts.

Always drain the VM's reservation on its previous host before
allocating on the target host — same host or not. This removes the
fromLastHost branch (now dead, the only other caller already passes
false), the matching boolean parameter on allocateVmCapacity, and the
misspelt moveToReservered parameter everywhere it appears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant