VMM crashes when guest programs BAR to unallocatable MMIO address

**Describe the bug**

While debugging Windows virtio-win 0.1.285 NetKVM driver issues (#7925), we observed a scenario where the guest programs a PCI BAR to an address that falls outside the MMIO64 allocator range. This causes `move_bar` to fail, but the BAR config register has already been updated by `detect_bar_reprogramming()`. The resulting inconsistency between config space and the MMIO bus mapping leads to virtio device activation failure (`Number of enabled queues lower than min`), crashing the VMM.

In our case, the trigger was a driver bug (wrong `used_len` in ctrl_queue response caused the driver to reset and attempt BAR reprogramming to a high address). After fixing the driver-side issue, this crash path is no longer hit. However, the underlying VMM behavior — crashing on a failed BAR move — seems worth hardening regardless of the trigger.

There are two independent issues:

**1. BAR config register not rolled back on failed move**

`detect_bar_reprogramming()` in `pci/src/configuration.rs` updates `bars[].addr` before `move_bar` is called. When `move_bar` fails (e.g., address outside allocator range), the config register says the BAR is at the new address, but the MMIO bus mapping is still at the old address. The guest reads back the new address, assumes it worked, and configures queues accordingly — but the device backend is at the old address. This causes queue setup to partially fail, and `activate()` rejects the device.

**2. MMIO64 allocator loses ~4 GiB at top of address space**

`create_mmio_allocators()` computes segment size as `(range / alignment) * alignment`, which truncates up to one alignment unit (4 GiB with 4 GiB alignment). Addresses near the top of the physical address space fall in this gap and cannot be allocated. While this didn't cause our issue directly (it was the driver bug), it's a latent issue for hosts with 46-bit physical addressing where the gap is at ~64 TiB.

**Error log (before fix):**

```
WARN: Failed moving device BAR: failed allocating new MMIO range: 0x3fffffd80000->0x3fffffe00000(0x80000)
ERROR: Number of enabled queues lower than min: 1 vs 2
Fatal error: VmmThread(ActivateVirtioDevices(...))
```

**Version**

```
cloud-hypervisor v51.0.0 (main branch)
```

**VM configuration**

```
cloud-hypervisor \
  --kernel CLOUDHV.fd \
  --disk path=windows.qcow2,num_queues=2,queue_size=512 \
  --cpus boot=2,kvm_hyperv=on \
  --memory size=4G \
  --net tap=tap0,mac=52:54:00:12:34:56,num_queues=2 \
  --serial tty --console off
```

Guest: Windows 11 25H2, virtio-win 0.1.285
Host: Linux 6.17, 46-bit physical addressing

**Defensive fixes**

We have two commits that harden the VMM against this class of failures. We're not sure if upstream considers this worth merging since the original trigger was a driver issue, but sharing in case it's useful:

Branch: https://github.com/CMGS/cloud-hypervisor/tree/fix/bar-rollback-defensive

1. **pci: rollback BAR address on failed move_bar** — restores config register and bar_regions to old values when `move_bar` fails, keeping device state consistent with MMIO mapping
2. **vmm: extend last MMIO64 allocator to cover full range** — gives the last PCI segment allocator all remaining space up to end of device area, eliminating the alignment truncation gap

**Logs**

N/A — the crash is deterministic when triggered.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VMM crashes when guest programs BAR to unallocatable MMIO address #7938

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VMM crashes when guest programs BAR to unallocatable MMIO address #7938

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions