Drain per-host reservation when a VM starts on a different host by Kukunin · Pull Request #13363 · apache/cloudstack

Kukunin · 2026-06-06T08:27:08Z

What's happening

When a VM is stopped via the API, CloudStack moves its CPU/RAM from used to reserved on its last_host_id (see the Stopping → Stopped + OperationSucceeded branch in CapacityManagerImpl.postStateTransitionEvent). The idea is that a quick restart on the same host can reclaim its earmarked slot cheaply.

The asymmetry: when the VM later starts on the same host, the fromLastHost=true branch in allocateVmCapacity drains that reservation. When it starts on a different host, the old reservation just sits there. It only clears when:

capacity.skipcounting.hours (default 1h) elapses and updateCapacityForHost recycles it, or
the VM is destroyed/expunged.

Until then, the orphan reservation gets summed into the cluster's used + reserved aggregate by FirstFitPlanner.removeClustersCrossingThreshold. On a cluster that's already near cluster.memory.allocated.capacity.disablethreshold (default 0.85), the phantom can trip the threshold and block subsequent VM starts in the whole cluster — even though the VM in question isn't actually consuming anything on its old host.

We hit this on a fairly full cluster where stop-then-start cycles started failing intermittently with InsufficientServerCapacityException: No destination found. The "ghost" capacity was the released-but-not-drained reservation from the last stop.

The fix

postStateTransitionEvent now drains the VM's reservation on its previous host before allocating on the target host — regardless of whether the target is the same host or a different one. Treating both cases identically removes the fromLastHost asymmetry.

if ((newState == State.Starting || newState == State.Migrating || event == Event.AgentReportMigrated) && vm.getHostId() != null) {
    if (vm.getLastHostId() != null) {
        releaseVmCapacity(vm, true, false, vm.getLastHostId());
    }
    allocateVmCapacity(vm);
}

Side effects of unifying the path:

The fromLastHost=true branch in allocateVmCapacity is now unreachable from postStateTransitionEvent. The only other caller (VirtualMachineManagerImpl#reconfiguringOnExistingHost) already passes false, so the parameter is removed entirely.
Fixes the long-standing moveToReservered typo (3 e's) — renamed to moveToReserved throughout the interface, the impl, and the debug logs.
One logger.debug(String.format(...)) in postStateTransitionEvent switched to SLF4J {} placeholders to match the rest of the file.

Tests

Two new tests in CapacityManagerImplTest cover both transitions:

testPostStateTransitionReleasesStaleReservationWhenStartingOnDifferentHost — fails on main, passes with the fix. Verifies releaseVmCapacity(vm, true, false, oldHostId) is invoked.
testPostStateTransitionReleasesReservationWhenStartingOnSameHost — guards the unified contract so both paths stay in lockstep.

Manual validation

Reproduced on a small test cluster:

Deploy a VM on host A, stop it (reserved=128M appears on A in op_host_capacity).
Force a start on host B via hostid.
Logs show the inline drain: release mem from host: A, old reserved: 128MB → new reserved: 0.
After the VM reaches Running on B, last_host_id updates to B and updateCapacityForHost agrees the reservation is gone.

Same-host stop/start round-trip continues to behave the same way as before.

When a VM is stopped via the API, postStateTransitionEvent moves its used capacity into reserved_capacity on its last host (so a quick restart can reclaim it cheaply). Until now, this reservation was only drained when the VM started again on the same lastHostId. Starting on any other host left an orphan reservation that lingered for up to one hour (capacity.skipcounting.hours) and was summed into the cluster used+reserved aggregate by FirstFitPlanner.removeClustersCrossingThreshold, spuriously tripping the disable-threshold and blocking later starts. Always drain the VM's reservation on its previous host before allocating on the target host — same host or not. This removes the fromLastHost branch (now dead, the only other caller already passes false), the matching boolean parameter on allocateVmCapacity, and the misspelt moveToReservered parameter everywhere it appears. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

boring-cyborg Bot added component:compute component:orchestration labels Jun 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drain per-host reservation when a VM starts on a different host#13363

Drain per-host reservation when a VM starts on a different host#13363
Kukunin wants to merge 1 commit into
apache:mainfrom
Kukunin:fix-stale-host-reservation-on-cross-host-start

Kukunin commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kukunin commented Jun 6, 2026

What's happening

The fix

Tests

Manual validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant